Makefile.in (TEXI_GCC_FILES): Add arm-neon-intrinsics.texi.

gcc/ * Makefile.in (TEXI_GCC_FILES): Add arm-neon-intrinsics.texi. * config.gcc (arm*-*-*): Add arm_neon.h to extra headers. (with_fpu): Allow --with-fpu=neon. * config/arm/aof.h (ADDITIONAL_REGISTER_NAMES): Add Q0-Q15. * config/arm/aout.h (ADDITIONAL_REGISTER_NAMES): Add Q0-Q15. * config/arm/arm-modes.def (EI, OI, CI, XI): New modes. * config/arm/arm-protos.h (neon_immediate_valid_for_move) (neon_immediate_valid_for_logic, neon_output_logic_immediate) (neon_pairwise_reduce, neon_expand_vector_init, neon_reinterpret) (neon_emit_pair_result_insn, neon_disambiguate_copy) (neon_vector_mem_operand, neon_struct_mem_operand, output_move_quad) (output_move_neon): Add prototypes. * config/arm/arm.c (FL_NEON): New flag for NEON processor capability. (all_fpus): Add FPUTYPE_NEON. (fp_model_for_fpu): Add NEON field. (arm_return_in_memory): Return vectors <= 16 bytes in ARM registers. (arm_arg_partial_bytes): Allow NEON vectors to be passed partially in registers. (arm_legitimate_address_p): Don't support fancy addressing for NEON structure moves. (thumb2_legitimate_address_p): Likewise. (neon_valid_immediate): Recognize and prepare constants suitable for NEON instructions. (neon_immediate_valid_for_move): New function. Recognize and prepare immediates for NEON move instructions. (neon_immediate_valid_for_logic): New function. Recognize and prepare immediates for NEON logic instructions. (neon_output_logic_immediate): New function. Create asm string suitable for outputting immediate logic instructions. (neon_pairwise_reduce): New function. Implement reduction using pairwise operations. (neon_expand_vector_init): New function. Expand a (possibly non-constant) vector initialization. (neon_vector_mem_operand): New function. Memory operands supported for quad-word loads/stores to/from ARM or NEON registers. Don't allow base+offset addressing for core regs. (neon_struct_mem_operand): New function. Valid mems for NEON structure moves. (coproc_secondary_reload_class): Enable NEON registers to be loaded from neon_vector_mem_operand addresses without a secondary register. (add_minipool_forward_ref): Handle >8-byte minipool entries. (add_minipool_backward_ref): Likewise. (dump_minipool): Likewise. (push_minipool_fix): Likewise. (output_move_quad): New function. Output quad-word moves, loads and stores using ARM registers. (output_move_vfp): Add support for vectors in VFP (NEON) D registers. (output_move_neon): Output a NEON load/store to/from a quadword register. (arm_print_operand): Implement new codes: - 'c' for unadorned integers (without a # sign). - 'J', 'K' for reg+2/reg+3, reg+3/reg+2 in little/big-endian mode. - 'e', 'f' for the low and high D parts of a NEON Q register. - 'q' outputs a NEON Q register. - 'h' outputs ranges of D registers for VLDM/VSTM etc. - 'T' prints NEON opcode features from a coded bitmask. - 'F' is similar to T, but signed/unsigned codes both print as 'i'. - 't' is similar to T, but 'u' is printed instead of 'p'. - 'O' prints 'r' if NEON instruction should perform rounding (as specified by bitmask), else prints nothing. - '#' is a punctuation character to stop operand numbers from running together with following digits in the assembler strings for instructions (when using mode attributes). (arm_assemble_integer): Handle extra NEON vector modes. Permute constant vectors in big-endian mode, where necessary. (arm_hard_regno_mode_ok): Allow vectors in VFP/NEON registers. Handle EI, OI, CI, XI modes. (ashlv4hi3, ashlv2si3, lshrv4hi3, lshrv2si3, ashrv4hi3) (ashrv2si3): Rename IWMMXT2_BUILTINs to... (ashlv4hi3_iwmmxt, ashlv2si3_iwmmxt, lshrv4hi3_iwmmxt) (lshrv2si3_iwmmxt, ashrv4hi3_iwmmxt, ashrv2si3_iwmmxt): New names. (neon_builtin_type_bits): Add enumeration, one bit for each vector type. (v8qi_UP, v4hi_UP, v2si_UP, v2sf_UP, di_UP, v16qi_UP, v8hi_UP) (v4si_UP, v4sf_UP, v2di_UP, ti_UP, ei_UP, oi_UP, UP): Define macros to turn v8qi, etc. into bits defined above. (neon_itype): New enumeration. Classifications of NEON builtins. (neon_builtin_datum): Define struct. Contains information about a single builtin (with multiple modes). (CF): Define helper macro for... (VAR1...VAR10): Define builtins with a type, name and 1-10 different modes. (neon_builtin_data): New array. Define information about builtins for use during initialization/expansion. (arm_init_neon_builtins): New function. (arm_init_builtins): Call arm_init_neon_builtins if TARGET_NEON is true. (neon_builtin_compare): New function. (locate_neon_builtin_icode): New function. Find an insn code for a builtin given a function code for that builtin. Also return type of builtin (NEON_BINOP, NEON_UNOP etc.). (builtin_arg): New enumeration. Types of arguments for builtins. (arm_expand_neon_args): New function. Expand a generic NEON builtin. Takes a variable argument list of builtin_arg types, terminated by NEON_ARG_STOP. (arm_expand_neon_builtin): New function. Expand a NEON builtin. (neon_reinterpret): New function. Expand NEON reinterpret intrinsic. (neon_emit_pair_result_insn): New function. Support returning pairs of vectors via a pointer. (neon_disambiguate_copy): New function. Set up operands for a multi-word copy such that registers do not get clobbered. (arm_expand_builtin): Call arm_expand_neon_builtin if fcode >= ARM_BUILTIN_NEON_BASE. (arm_file_start): Set float-abi attribute for NEON. (arm_vector_mode_supported_p): Enable NEON vector modes. (arm_mangle_map_entry): New. (arm_mangle_map): New. (arm_mangle_vector_type): New. * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_NEON__ when appropriate. (TARGET_NEON): New macro. Target supports NEON. (fputype): Add FPUTYPE_NEON. (UNITS_PER_SIMD_WORD): Define. Allow quad-word registers to be used for vectorization based on command-line arg. (NEON_REGNO_OK_FOR_NREGS): Define. (VALID_NEON_DREG_MODE, VALID_NEON_QREG_MODE) (VALID_NEON_STRUCT_MODE): Define. (PRINT_OPERAND_PUNCT_VALID_P): '#' is valid punctuation. (arm_builtins): Add ARM_BUILTIN_NEON_BASE. * config/arm/arm.md (VUNSPEC_POOL_16): Insert constant for unspec. (consttable_16): Add pattern for outputting 16-byte minipool entries. (movv2si, movv4hi, movv8qi): Remove blank expanders (redefined in vec-common.md). (vec-common.md, neon.md): Include md files. * config/arm/arm.opt (mvectorize-with-neon-quad): Add option. * config/arm/constraints.md (constraint "Dn", "Dl", "DL"): Define. (memory_constraint "Ut", "Un", "Us"): Define. * config/arm/iwmmxt.md (VMMX, VSHFT): New mode macros. (MMX_char): New mode attribute. (addv8qi3, addv4hi3, addv2si3): Remove. Replace with... (*add<mode>3_iwmmxt): New insn pattern. (subv8qi3, subv4hi3, subv2si3): Remove. Replace with... (*sub<mode>3_iwmmxt): New insn pattern. (mulv4hi3): Rename to... (*mulv4hi3_iwmmxt): This. (smaxv8qi3, smaxv4hi3, smaxv2si3, umaxv8qi3, umaxv4hi3) (umaxv2si3, sminv8qi3, sminv4hi3, sminv2si3, uminv8qi3) (uminv4hi3, uminv2si3): Remove. Replace with... (*smax<mode>3_iwmmxt, *umax<mode>3_iwmmxt, *smin<mode>3_iwmmxt) (*umin<mode>3_iwmmxt): These. (ashrv4hi3, ashrv2si3, ashrdi3_iwmmxt): Replace with... (ashr<mode>3_iwmmxt): This new pattern. (lshrv4hi3, lshrv2si3, lshrdi3_iwmmxt): Replace with... (lshr<mode>3_iwmmxt): This new pattern. (ashlv4hi3, ashlv2si3, ashldi3_iwmmxt): Replace with... (ashl<mode>3_iwmmxt): This new pattern. * config/arm/neon-docgen.ml: New file. Generate documentation for intrinsics. * config/arm/neon-gen.ml: New file. Generate arm_neon.h header. * config/arm/arm_neon.h: New (autogenerated). * config/arm/neon-testgen.ml: New file. Generate NEON tests automatically. * config/arm/neon.md: New file. Define NEON instructions. * config/arm/neon.ml: New file. Abstract description of NEON instructions, used to generate arm_neon.h header, documentation and tests. * config/arm/t-arm (MD_INCLUDES): Add vec-common.md, neon.md. * vec-common.md: New file. Shared parts for iWMMXt and NEON vector support. * doc/extend.texi (ARM Built-in Functions): Rename and remove extraneous comma. (ARM NEON Intrinsics): New subsection. * doc/arm-neon-intrinsics.texi: New (autogenerated). gcc/testsuite/ * gcc.dg/vect/vect.exp: Check is-effective-target arm_neon_hw. * gcc.dg/vect/tree-vect.h: Check for NEON SIMD support. * lib/gcc-dg.exp (cleanup-saved-temps): Fix comment. * lib/target-supports.exp (check_effective_target_arm_neon_ok) (check_effective_target_arm_neon_hw): New. * gcc.target/arm/neon/neon.exp: New file. * gcc.target/arm/neon/polytypes.c: New file. * gcc.target/arm/neon/v*.c (1870 files): New (autogenerated). Co-Authored-By: Joseph Myers <joseph@codesourcery.com> Co-Authored-By: Mark Shinwell <shinwell@codesourcery.com> Co-Authored-By: Paul Brook <paul@codesourcery.com> From-SVN: r126911
author: Julian Brown <julian@codesourcery.com> 2007-07-25 12:28:31 +0000
committer: Julian Brown <jules@gcc.gnu.org> 2007-07-25 12:28:31 +0000
commit: 88f77cba027fa1be471081bcd2ec03392246af3a (patch)
tree: 3d9e535e4852684293654d9dcacf4334f53ce19c /gcc/config
parent: 15d92b36a1cc26f5eda0c09f3fa1c369d0a36260 (diff)
download: gcc-88f77cba027fa1be471081bcd2ec03392246af3a.zip
gcc-88f77cba027fa1be471081bcd2ec03392246af3a.tar.gz
gcc-88f77cba027fa1be471081bcd2ec03392246af3a.tar.bz2
19 files changed, 21400 insertions, 294 deletions
diff --git a/gcc/config/arm/aof.h b/gcc/config/arm/aof.h
index 71d87a5..eeb06fd 100644
--- a/gcc/config/arm/aof.h
+++ b/gcc/config/arm/aof.h
@@ -239,22 +239,30 @@ do {					\
   {"r13", 13}, {"sp", 13}, 			\
   {"r14", 14}, {"lr", 14},			\
   {"r15", 15}, {"pc", 15},			\
-  {"d0", 63},					\
+  {"d0", 63}, {"q0", 63},			\
   {"d1", 65},					\
-  {"d2", 67},					\
+  {"d2", 67}, {"q1", 67},			\
   {"d3", 69},					\
-  {"d4", 71},					\
+  {"d4", 71}, {"q2", 71},			\
   {"d5", 73},					\
-  {"d6", 75},					\
+  {"d6", 75}, {"q3", 75},			\
   {"d7", 77},					\
-  {"d8", 79},					\
+  {"d8", 79}, {"q4", 79},			\
   {"d9", 81},					\
-  {"d10", 83},					\
+  {"d10", 83}, {"q5", 83},			\
   {"d11", 85},					\
-  {"d12", 87},					\
+  {"d12", 87}, {"q6", 87},			\
   {"d13", 89},					\
-  {"d14", 91},					\
-  {"d15", 93}					\
+  {"d14", 91}, {"q7", 91},			\
+  {"d15", 93},					\
+  {"q8", 95},					\
+  {"q9", 99},					\
+  {"q10", 103},					\
+  {"q11", 107},					\
+  {"q12", 111},					\
+  {"q13", 115},					\
+  {"q14", 119},					\
+  {"q15", 123}					\
 }
 
 #define REGISTER_PREFIX "__"
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index a47859a..460338d 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -165,22 +165,30 @@
   {"mvdx13", 40},				\
   {"mvdx14", 41},				\
   {"mvdx15", 42},				\
-  {"d0", 63},					\
+  {"d0", 63}, {"q0", 63},			\
   {"d1", 65},					\
-  {"d2", 67},					\
+  {"d2", 67}, {"q1", 67},			\
   {"d3", 69},					\
-  {"d4", 71},					\
+  {"d4", 71}, {"q2", 71},			\
   {"d5", 73},					\
-  {"d6", 75},					\
+  {"d6", 75}, {"q3", 75},			\
   {"d7", 77},					\
-  {"d8", 79},					\
+  {"d8", 79}, {"q4", 79},			\
   {"d9", 81},					\
-  {"d10", 83},					\
+  {"d10", 83}, {"q5", 83},			\
   {"d11", 85},					\
-  {"d12", 87},					\
+  {"d12", 87}, {"q6", 87},			\
   {"d13", 89},					\
-  {"d14", 91},					\
+  {"d14", 91}, {"q7", 91},			\
   {"d15", 93},					\
+  {"q8", 95},					\
+  {"q9", 99},					\
+  {"q10", 103},					\
+  {"q11", 107},					\
+  {"q12", 111},					\
+  {"q13", 115},					\
+  {"q14", 119},					\
+  {"q15", 123}					\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 10ba025..6f36e03 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -58,3 +58,11 @@ VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI */
 VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (FLOAT, 16);     /*       V8HF V4SF V2DF */
 
+/* Opaque integer modes for 3, 4, 6 or 8 Neon double registers (2 is
+   TImode).  */
+INT_MODE (EI, 24);
+INT_MODE (OI, 32);
+INT_MODE (CI, 48);
+/* ??? This should actually have 512 bits but the precision only has 9
+   bits.  */
+FRACTIONAL_INT_MODE (XI, 511, 64);
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 98cb5ef..a877c6d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -68,6 +68,19 @@ extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
+extern int neon_immediate_valid_for_move (rtx, enum machine_mode, rtx *, int *);
+extern int neon_immediate_valid_for_logic (rtx, enum machine_mode, int, rtx *,
+					   int *);
+extern char *neon_output_logic_immediate (const char *, rtx *,
+					  enum machine_mode, int, int);
+extern void neon_pairwise_reduce (rtx, rtx, enum machine_mode,
+				  rtx (*) (rtx, rtx, rtx));
+extern void neon_expand_vector_init (rtx, rtx);
+extern void neon_reinterpret (rtx, rtx);
+extern void neon_emit_pair_result_insn (enum machine_mode,
+					rtx (*) (rtx, rtx, rtx, rtx),
+					rtx, rtx, rtx);
+extern void neon_disambiguate_copy (rtx *, rtx *, rtx *, unsigned int);
 extern enum reg_class coproc_secondary_reload_class (enum machine_mode, rtx,
 						     bool);
 extern bool arm_tls_referenced_p (rtx);
@@ -75,6 +88,8 @@ extern bool arm_cannot_force_const_mem (rtx);
 
 extern int cirrus_memory_offset (rtx);
 extern int arm_coproc_mem_operand (rtx, bool);
+extern int neon_vector_mem_operand (rtx, bool);
+extern int neon_struct_mem_operand (rtx);
 extern int arm_no_early_store_addr_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
@@ -113,7 +128,9 @@ extern const char *output_mov_long_double_arm_from_arm (rtx *);
 extern const char *output_mov_double_fpa_from_arm (rtx *);
 extern const char *output_mov_double_arm_from_fpa (rtx *);
 extern const char *output_move_double (rtx *);
+extern const char *output_move_quad (rtx *);
 extern const char *output_move_vfp (rtx *operands);
+extern const char *output_move_neon (rtx *operands);
 extern const char *output_add_immediate (rtx *);
 extern const char *arithmetic_instr (rtx, int);
 extern void output_ascii_pseudo_op (FILE *, const unsigned char *, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 36fba5b..94926d8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -461,6 +461,7 @@ static int thumb_call_reg_needed;
 					 profile.  */
 #define FL_DIV	      (1 << 18)	      /* Hardware divide.  */
 #define FL_VFPV3      (1 << 19)       /* Vector Floating Point V3.  */
+#define FL_NEON       (1 << 20)       /* Neon instructions.  */
 
 #define FL_IWMMXT     (1 << 29)	      /* XScale v2 or "Intel Wireless MMX technology".  */
 
@@ -706,6 +707,7 @@ static const struct fpu_desc all_fpus[] =
   {"maverick",	FPUTYPE_MAVERICK},
   {"vfp",	FPUTYPE_VFP},
   {"vfp3",	FPUTYPE_VFP3},
+  {"neon",	FPUTYPE_NEON}
 };
 
 
@@ -721,7 +723,8 @@ static const enum fputype fp_model_for_fpu[] =
   ARM_FP_MODEL_FPA,		/* FPUTYPE_FPA_EMU3  */
   ARM_FP_MODEL_MAVERICK,	/* FPUTYPE_MAVERICK  */
   ARM_FP_MODEL_VFP,		/* FPUTYPE_VFP  */
-  ARM_FP_MODEL_VFP		/* FPUTYPE_VFP3  */
+  ARM_FP_MODEL_VFP,		/* FPUTYPE_VFP3  */
+  ARM_FP_MODEL_VFP		/* FPUTYPE_NEON  */
 };
 
 
@@ -2754,15 +2757,20 @@ arm_return_in_memory (tree type)
 {
   HOST_WIDE_INT size;
 
+  size = int_size_in_bytes (type);
+
+  /* Vector values should be returned using ARM registers, not memory (unless
+     they're over 16 bytes, which will break since we only have four
+     call-clobbered registers to play with).  */
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    return (size < 0 || size > (4 * UNITS_PER_WORD));
+
   if (!AGGREGATE_TYPE_P (type) &&
-      (TREE_CODE (type) != VECTOR_TYPE) &&
       !(TARGET_AAPCS_BASED && TREE_CODE (type) == COMPLEX_TYPE))
     /* All simple types are returned in registers.
        For AAPCS, complex types are treated the same as aggregates.  */
     return 0;
 
-  size = int_size_in_bytes (type);
-
   if (arm_abi != ARM_ABI_APCS)
     {
       /* ATPCS and later return aggregate types in memory only if they are
@@ -2770,11 +2778,6 @@ arm_return_in_memory (tree type)
       return (size < 0 || size > UNITS_PER_WORD);
     }
 
-  /* To maximize backwards compatibility with previous versions of gcc,
-     return vectors up to 4 words in registers.  */
-  if (TREE_CODE (type) == VECTOR_TYPE)
-    return (size < 0 || size > (4 * UNITS_PER_WORD));
-
   /* For the arm-wince targets we choose to be compatible with Microsoft's
      ARM and Thumb compilers, which always return aggregates in memory.  */
 #ifndef ARM_WINCE
@@ -2988,7 +2991,7 @@ arm_arg_partial_bytes (CUMULATIVE_ARGS *pcum, enum machine_mode mode,
 {
   int nregs = pcum->nregs;
 
-  if (arm_vector_mode_supported_p (mode))
+  if (TARGET_IWMMXT_ABI && arm_vector_mode_supported_p (mode))
     return 0;
 
   if (NUM_ARG_REGS > nregs
@@ -3787,7 +3790,7 @@ arm_legitimate_address_p (enum machine_mode mode, rtx x, RTX_CODE outer,
 		   && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT)))
     return 1;
 
-  else if (mode == TImode)
+  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode)))
     return 0;
 
   else if (code == PLUS)
@@ -3873,7 +3876,7 @@ thumb2_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
 		   && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT)))
     return 1;
 
-  else if (mode == TImode)
+  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode)))
     return 0;
 
   else if (code == PLUS)
@@ -3916,6 +3919,13 @@ arm_legitimate_index_p (enum machine_mode mode, rtx index, RTX_CODE outer,
 	    && INTVAL (index) > -1024
 	    && (INTVAL (index) & 3) == 0);
 
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    return (code == CONST_INT
+	    && INTVAL (index) < 1016
+	    && INTVAL (index) > -1024
+	    && (INTVAL (index) & 3) == 0);
+
   if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
     return (code == CONST_INT
 	    && INTVAL (index) < 1024
@@ -4026,6 +4036,13 @@ thumb2_legitimate_index_p (enum machine_mode mode, rtx index, int strict_p)
 		&& (INTVAL (index) & 3) == 0);
     }
 
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    return (code == CONST_INT
+	    && INTVAL (index) < 1016
+	    && INTVAL (index) > -1024
+	    && (INTVAL (index) & 3) == 0);
+
   if (arm_address_register_rtx_p (index, strict_p)
       && (GET_MODE_SIZE (mode) <= 4))
     return 1;
@@ -5863,6 +5880,357 @@ vfp3_const_double_rtx (rtx x)
   return vfp3_const_double_index (x) != -1;
 }
 
+/* Recognize immediates which can be used in various Neon instructions. Legal
+   immediates are described by the following table (for VMVN variants, the
+   bitwise inverse of the constant shown is recognized. In either case, VMOV
+   is output and the correct instruction to use for a given constant is chosen
+   by the assembler). The constant shown is replicated across all elements of
+   the destination vector.
+
+   insn elems variant constant (binary)
+   ---- ----- ------- -----------------
+   vmov  i32     0    00000000 00000000 00000000 abcdefgh
+   vmov  i32     1    00000000 00000000 abcdefgh 00000000
+   vmov  i32     2    00000000 abcdefgh 00000000 00000000
+   vmov  i32     3    abcdefgh 00000000 00000000 00000000
+   vmov  i16     4    00000000 abcdefgh
+   vmov  i16     5    abcdefgh 00000000
+   vmvn  i32     6    00000000 00000000 00000000 abcdefgh
+   vmvn  i32     7    00000000 00000000 abcdefgh 00000000
+   vmvn  i32     8    00000000 abcdefgh 00000000 00000000
+   vmvn  i32     9    abcdefgh 00000000 00000000 00000000
+   vmvn  i16    10    00000000 abcdefgh
+   vmvn  i16    11    abcdefgh 00000000
+   vmov  i32    12    00000000 00000000 abcdefgh 11111111
+   vmvn  i32    13    00000000 00000000 abcdefgh 11111111
+   vmov  i32    14    00000000 abcdefgh 11111111 11111111
+   vmvn  i32    15    00000000 abcdefgh 11111111 11111111
+   vmov   i8    16    abcdefgh
+   vmov  i64    17    aaaaaaaa bbbbbbbb cccccccc dddddddd
+                      eeeeeeee ffffffff gggggggg hhhhhhhh
+   vmov  f32    18    aBbbbbbc defgh000 00000000 00000000
+
+   For case 18, B = !b. Representable values are exactly those accepted by
+   vfp3_const_double_index, but are output as floating-point numbers rather
+   than indices.
+
+   Variants 0-5 (inclusive) may also be used as immediates for the second
+   operand of VORR/VBIC instructions.
+
+   The INVERSE argument causes the bitwise inverse of the given operand to be
+   recognized instead (used for recognizing legal immediates for the VAND/VORN
+   pseudo-instructions). If INVERSE is true, the value placed in *MODCONST is
+   *not* inverted (i.e. the pseudo-instruction forms vand/vorn should still be
+   output, rather than the real insns vbic/vorr).
+
+   INVERSE makes no difference to the recognition of float vectors.
+
+   The return value is the variant of immediate as shown in the above table, or
+   -1 if the given value doesn't match any of the listed patterns.
+*/
+static int
+neon_valid_immediate (rtx op, enum machine_mode mode, int inverse,
+		      rtx *modconst, int *elementwidth)
+{
+#define CHECK(STRIDE, ELSIZE, CLASS, TEST)	\
+  matches = 1;					\
+  for (i = 0; i < idx; i += (STRIDE))		\
+    if (!(TEST))				\
+      matches = 0;				\
+  if (matches)					\
+    {						\
+      immtype = (CLASS);			\
+      elsize = (ELSIZE);			\
+      break;					\
+    }
+
+  unsigned int i, elsize, idx = 0, n_elts = CONST_VECTOR_NUNITS (op);
+  unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode));
+  unsigned char bytes[16];
+  int immtype = -1, matches;
+  unsigned int invmask = inverse ? 0xff : 0;
+
+  /* Vectors of float constants.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+    {
+      rtx el0 = CONST_VECTOR_ELT (op, 0);
+      REAL_VALUE_TYPE r0;
+
+      if (!vfp3_const_double_rtx (el0))
+        return -1;
+
+      REAL_VALUE_FROM_CONST_DOUBLE (r0, el0);
+
+      for (i = 1; i < n_elts; i++)
+        {
+          rtx elt = CONST_VECTOR_ELT (op, i);
+          REAL_VALUE_TYPE re;
+
+          REAL_VALUE_FROM_CONST_DOUBLE (re, elt);
+
+          if (!REAL_VALUES_EQUAL (r0, re))
+            return -1;
+        }
+
+      if (modconst)
+        *modconst = CONST_VECTOR_ELT (op, 0);
+
+      if (elementwidth)
+        *elementwidth = 0;
+
+      return 18;
+    }
+
+  /* Splat vector constant out into a byte vector.  */
+  for (i = 0; i < n_elts; i++)
+    {
+      rtx el = CONST_VECTOR_ELT (op, i);
+      unsigned HOST_WIDE_INT elpart;
+      unsigned int part, parts;
+
+      if (GET_CODE (el) == CONST_INT)
+        {
+          elpart = INTVAL (el);
+          parts = 1;
+        }
+      else if (GET_CODE (el) == CONST_DOUBLE)
+        {
+          elpart = CONST_DOUBLE_LOW (el);
+          parts = 2;
+        }
+      else
+        gcc_unreachable ();
+
+      for (part = 0; part < parts; part++)
+        {
+          unsigned int byte;
+          for (byte = 0; byte < innersize; byte++)
+            {
+              bytes[idx++] = (elpart & 0xff) ^ invmask;
+              elpart >>= BITS_PER_UNIT;
+            }
+          if (GET_CODE (el) == CONST_DOUBLE)
+            elpart = CONST_DOUBLE_HIGH (el);
+        }
+    }
+
+  /* Sanity check.  */
+  gcc_assert (idx == GET_MODE_SIZE (mode));
+
+  do
+    {
+      CHECK (4, 32, 0, bytes[i] == bytes[0] && bytes[i + 1] == 0
+		       && bytes[i + 2] == 0 && bytes[i + 3] == 0);
+
+      CHECK (4, 32, 1, bytes[i] == 0 && bytes[i + 1] == bytes[1]
+		       && bytes[i + 2] == 0 && bytes[i + 3] == 0);
+
+      CHECK (4, 32, 2, bytes[i] == 0 && bytes[i + 1] == 0
+		       && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0);
+
+      CHECK (4, 32, 3, bytes[i] == 0 && bytes[i + 1] == 0
+		       && bytes[i + 2] == 0 && bytes[i + 3] == bytes[3]);
+
+      CHECK (2, 16, 4, bytes[i] == bytes[0] && bytes[i + 1] == 0);
+
+      CHECK (2, 16, 5, bytes[i] == 0 && bytes[i + 1] == bytes[1]);
+
+      CHECK (4, 32, 6, bytes[i] == bytes[0] && bytes[i + 1] == 0xff
+		       && bytes[i + 2] == 0xff && bytes[i + 3] == 0xff);
+
+      CHECK (4, 32, 7, bytes[i] == 0xff && bytes[i + 1] == bytes[1]
+		       && bytes[i + 2] == 0xff && bytes[i + 3] == 0xff);
+
+      CHECK (4, 32, 8, bytes[i] == 0xff && bytes[i + 1] == 0xff
+		       && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0xff);
+
+      CHECK (4, 32, 9, bytes[i] == 0xff && bytes[i + 1] == 0xff
+		       && bytes[i + 2] == 0xff && bytes[i + 3] == bytes[3]);
+
+      CHECK (2, 16, 10, bytes[i] == bytes[0] && bytes[i + 1] == 0xff);
+
+      CHECK (2, 16, 11, bytes[i] == 0xff && bytes[i + 1] == bytes[1]);
+
+      CHECK (4, 32, 12, bytes[i] == 0xff && bytes[i + 1] == bytes[1]
+			&& bytes[i + 2] == 0 && bytes[i + 3] == 0);
+
+      CHECK (4, 32, 13, bytes[i] == 0 && bytes[i + 1] == bytes[1]
+			&& bytes[i + 2] == 0xff && bytes[i + 3] == 0xff);
+
+      CHECK (4, 32, 14, bytes[i] == 0xff && bytes[i + 1] == 0xff
+			&& bytes[i + 2] == bytes[2] && bytes[i + 3] == 0);
+
+      CHECK (4, 32, 15, bytes[i] == 0 && bytes[i + 1] == 0
+			&& bytes[i + 2] == bytes[2] && bytes[i + 3] == 0xff);
+
+      CHECK (1, 8, 16, bytes[i] == bytes[0]);
+
+      CHECK (1, 64, 17, (bytes[i] == 0 || bytes[i] == 0xff)
+			&& bytes[i] == bytes[(i + 8) % idx]);
+    }
+  while (0);
+
+  if (immtype == -1)
+    return -1;
+
+  if (elementwidth)
+    *elementwidth = elsize;
+
+  if (modconst)
+    {
+      unsigned HOST_WIDE_INT imm = 0;
+
+      /* Un-invert bytes of recognized vector, if neccessary.  */
+      if (invmask != 0)
+        for (i = 0; i < idx; i++)
+          bytes[i] ^= invmask;
+
+      if (immtype == 17)
+        {
+          /* FIXME: Broken on 32-bit H_W_I hosts.  */
+          gcc_assert (sizeof (HOST_WIDE_INT) == 8);
+
+          for (i = 0; i < 8; i++)
+            imm |= (unsigned HOST_WIDE_INT) (bytes[i] ? 0xff : 0)
+                   << (i * BITS_PER_UNIT);
+
+          *modconst = GEN_INT (imm);
+        }
+      else
+        {
+          unsigned HOST_WIDE_INT imm = 0;
+
+          for (i = 0; i < elsize / BITS_PER_UNIT; i++)
+            imm |= (unsigned HOST_WIDE_INT) bytes[i] << (i * BITS_PER_UNIT);
+
+          *modconst = GEN_INT (imm);
+        }
+    }
+
+  return immtype;
+#undef CHECK
+}
+
+/* Return TRUE if rtx X is legal for use as either a Neon VMOV (or, implicitly,
+   VMVN) immediate. Write back width per element to *ELEMENTWIDTH (or zero for
+   float elements), and a modified constant (whatever should be output for a
+   VMOV) in *MODCONST.  */
+
+int
+neon_immediate_valid_for_move (rtx op, enum machine_mode mode,
+			       rtx *modconst, int *elementwidth)
+{
+  rtx tmpconst;
+  int tmpwidth;
+  int retval = neon_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
+
+  if (retval == -1)
+    return 0;
+
+  if (modconst)
+    *modconst = tmpconst;
+
+  if (elementwidth)
+    *elementwidth = tmpwidth;
+
+  return 1;
+}
+
+/* Return TRUE if rtx X is legal for use in a VORR or VBIC instruction.  If
+   the immediate is valid, write a constant suitable for using as an operand
+   to VORR/VBIC/VAND/VORN to *MODCONST and the corresponding element width to
+   *ELEMENTWIDTH. See neon_valid_immediate for description of INVERSE.  */
+
+int
+neon_immediate_valid_for_logic (rtx op, enum machine_mode mode, int inverse,
+				rtx *modconst, int *elementwidth)
+{
+  rtx tmpconst;
+  int tmpwidth;
+  int retval = neon_valid_immediate (op, mode, inverse, &tmpconst, &tmpwidth);
+
+  if (retval < 0 || retval > 5)
+    return 0;
+
+  if (modconst)
+    *modconst = tmpconst;
+
+  if (elementwidth)
+    *elementwidth = tmpwidth;
+
+  return 1;
+}
+
+/* Return a string suitable for output of Neon immediate logic operation
+   MNEM.  */
+
+char *
+neon_output_logic_immediate (const char *mnem, rtx *op2, enum machine_mode mode,
+			     int inverse, int quad)
+{
+  int width, is_valid;
+  static char templ[40];
+
+  is_valid = neon_immediate_valid_for_logic (*op2, mode, inverse, op2, &width);
+
+  gcc_assert (is_valid != 0);
+
+  if (quad)
+    sprintf (templ, "%s.i%d\t%%q0, %%2", mnem, width);
+  else
+    sprintf (templ, "%s.i%d\t%%P0, %%2", mnem, width);
+
+  return templ;
+}
+
+/* Output a sequence of pairwise operations to implement a reduction.
+   NOTE: We do "too much work" here, because pairwise operations work on two
+   registers-worth of operands in one go. Unfortunately we can't exploit those
+   extra calculations to do the full operation in fewer steps, I don't think.
+   Although all vector elements of the result but the first are ignored, we
+   actually calculate the same result in each of the elements. An alternative
+   such as initially loading a vector with zero to use as each of the second
+   operands would use up an additional register and take an extra instruction,
+   for no particular gain.  */
+
+void
+neon_pairwise_reduce (rtx op0, rtx op1, enum machine_mode mode,
+		      rtx (*reduc) (rtx, rtx, rtx))
+{
+  enum machine_mode inner = GET_MODE_INNER (mode);
+  unsigned int i, parts = GET_MODE_SIZE (mode) / GET_MODE_SIZE (inner);
+  rtx tmpsum = op1;
+
+  for (i = parts / 2; i >= 1; i /= 2)
+    {
+      rtx dest = (i == 1) ? op0 : gen_reg_rtx (mode);
+      emit_insn (reduc (dest, tmpsum, tmpsum));
+      tmpsum = dest;
+    }
+}
+
+/* Initialise a vector with non-constant elements.  FIXME: We can do better
+   than the current implementation (building a vector on the stack and then
+   loading it) in many cases.  See rs6000.c.  */
+
+void
+neon_expand_vector_init (rtx target, rtx vals)
+{
+  enum machine_mode mode = GET_MODE (target);
+  enum machine_mode inner = GET_MODE_INNER (mode);
+  unsigned int i, n_elts = GET_MODE_NUNITS (mode);
+  rtx mem;
+
+  gcc_assert (VECTOR_MODE_P (mode));
+
+  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
+  for (i = 0; i < n_elts; i++)
+    emit_move_insn (adjust_address_nv (mem, inner, i * GET_MODE_SIZE (inner)),
+                   XVECEXP (vals, 0, i));
+
+  emit_move_insn (target, mem);
+}
+
 
 /* Predicates for `match_operand' and `match_operator'.  */
 
@@ -5974,6 +6342,110 @@ arm_coproc_mem_operand (rtx op, bool wb)
   return FALSE;
 }
 
+/* Return TRUE if OP is a memory operand which we can load or store a vector
+   to/from. If CORE is true, we're moving from ARM registers not Neon
+   registers.  */
+int
+neon_vector_mem_operand (rtx op, bool core)
+{
+  rtx ind;
+
+  /* Reject eliminable registers.  */
+  if (! (reload_in_progress || reload_completed)
+      && (   reg_mentioned_p (frame_pointer_rtx, op)
+	  || reg_mentioned_p (arg_pointer_rtx, op)
+	  || reg_mentioned_p (virtual_incoming_args_rtx, op)
+	  || reg_mentioned_p (virtual_outgoing_args_rtx, op)
+	  || reg_mentioned_p (virtual_stack_dynamic_rtx, op)
+	  || reg_mentioned_p (virtual_stack_vars_rtx, op)))
+    return FALSE;
+
+  /* Constants are converted into offsets from labels.  */
+  if (GET_CODE (op) != MEM)
+    return FALSE;
+
+  ind = XEXP (op, 0);
+
+  if (reload_completed
+      && (GET_CODE (ind) == LABEL_REF
+	  || (GET_CODE (ind) == CONST
+	      && GET_CODE (XEXP (ind, 0)) == PLUS
+	      && GET_CODE (XEXP (XEXP (ind, 0), 0)) == LABEL_REF
+	      && GET_CODE (XEXP (XEXP (ind, 0), 1)) == CONST_INT)))
+    return TRUE;
+
+  /* Match: (mem (reg)).  */
+  if (GET_CODE (ind) == REG)
+    return arm_address_register_rtx_p (ind, 0);
+
+  /* Allow post-increment with Neon registers.  */
+  if (!core && GET_CODE (ind) == POST_INC)
+    return arm_address_register_rtx_p (XEXP (ind, 0), 0);
+
+#if 0
+  /* FIXME: We can support this too if we use VLD1/VST1.  */
+  if (!core
+      && GET_CODE (ind) == POST_MODIFY
+      && arm_address_register_rtx_p (XEXP (ind, 0), 0)
+      && GET_CODE (XEXP (ind, 1)) == PLUS
+      && rtx_equal_p (XEXP (XEXP (ind, 1), 0), XEXP (ind, 0)))
+    ind = XEXP (ind, 1);
+#endif
+
+  /* Match:
+     (plus (reg)
+          (const)).  */
+  if (!core
+      && GET_CODE (ind) == PLUS
+      && GET_CODE (XEXP (ind, 0)) == REG
+      && REG_MODE_OK_FOR_BASE_P (XEXP (ind, 0), VOIDmode)
+      && GET_CODE (XEXP (ind, 1)) == CONST_INT
+      && INTVAL (XEXP (ind, 1)) > -1024
+      && INTVAL (XEXP (ind, 1)) < 1016
+      && (INTVAL (XEXP (ind, 1)) & 3) == 0)
+    return TRUE;
+
+  return FALSE;
+}
+
+/* Return TRUE if OP is a mem suitable for loading/storing a Neon struct
+   type.  */
+int
+neon_struct_mem_operand (rtx op)
+{
+  rtx ind;
+
+  /* Reject eliminable registers.  */
+  if (! (reload_in_progress || reload_completed)
+      && (   reg_mentioned_p (frame_pointer_rtx, op)
+	  || reg_mentioned_p (arg_pointer_rtx, op)
+	  || reg_mentioned_p (virtual_incoming_args_rtx, op)
+	  || reg_mentioned_p (virtual_outgoing_args_rtx, op)
+	  || reg_mentioned_p (virtual_stack_dynamic_rtx, op)
+	  || reg_mentioned_p (virtual_stack_vars_rtx, op)))
+    return FALSE;
+
+  /* Constants are converted into offsets from labels.  */
+  if (GET_CODE (op) != MEM)
+    return FALSE;
+
+  ind = XEXP (op, 0);
+
+  if (reload_completed
+      && (GET_CODE (ind) == LABEL_REF
+	  || (GET_CODE (ind) == CONST
+	      && GET_CODE (XEXP (ind, 0)) == PLUS
+	      && GET_CODE (XEXP (XEXP (ind, 0), 0)) == LABEL_REF
+	      && GET_CODE (XEXP (XEXP (ind, 0), 1)) == CONST_INT)))
+    return TRUE;
+
+  /* Match: (mem (reg)).  */
+  if (GET_CODE (ind) == REG)
+    return arm_address_register_rtx_p (ind, 0);
+
+  return FALSE;
+}
+
 /* Return true if X is a register that will be eliminated later on.  */
 int
 arm_eliminable_register (rtx x)
@@ -5990,6 +6462,12 @@ arm_eliminable_register (rtx x)
 enum reg_class
 coproc_secondary_reload_class (enum machine_mode mode, rtx x, bool wb)
 {
+  if (TARGET_NEON
+      && (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+          || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+      && neon_vector_mem_operand (x, FALSE))
+     return NO_REGS;
+
   if (arm_coproc_mem_operand (x, wb) || s_register_operand (x, mode))
     return NO_REGS;
 
@@ -8055,8 +8533,8 @@ add_minipool_forward_ref (Mfix *fix)
 	 placed at the start of the pool.  */
       if (ARM_DOUBLEWORD_ALIGN
 	  && max_mp == NULL
-	  && fix->fix_size == 8
-	  && mp->fix_size != 8)
+	  && fix->fix_size >= 8
+	  && mp->fix_size < 8)
 	{
 	  max_mp = mp;
 	  max_address = mp->max_address;
@@ -8236,7 +8714,7 @@ add_minipool_backward_ref (Mfix *fix)
 	      /* For now, we do not allow the insertion of 8-byte alignment
 		 requiring nodes anywhere but at the start of the pool.  */
 	      if (ARM_DOUBLEWORD_ALIGN
-		  && fix->fix_size == 8 && mp->fix_size != 8)
+		  && fix->fix_size >= 8 && mp->fix_size < 8)
 		return NULL;
 	      else
 		min_mp = mp;
@@ -8257,7 +8735,7 @@ add_minipool_backward_ref (Mfix *fix)
 	     placed at the start of the pool.  */
 	  else if (ARM_DOUBLEWORD_ALIGN
 		   && min_mp == NULL
-		   && fix->fix_size == 8
+		   && fix->fix_size >= 8
 		   && mp->fix_size < 8)
 	    {
 	      min_mp = mp;
@@ -8355,7 +8833,7 @@ dump_minipool (rtx scan)
 
   if (ARM_DOUBLEWORD_ALIGN)
     for (mp = minipool_vector_head; mp != NULL; mp = mp->next)
-      if (mp->refcount > 0 && mp->fix_size == 8)
+      if (mp->refcount > 0 && mp->fix_size >= 8)
 	{
 	  align64 = 1;
 	  break;
@@ -8410,6 +8888,12 @@ dump_minipool (rtx scan)
 	      break;
 
 #endif
+#ifdef HAVE_consttable_16
+	    case 16:
+              scan = emit_insn_after (gen_consttable_16 (mp->value), scan);
+              break;
+
+#endif
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -8602,7 +9086,7 @@ push_minipool_fix (rtx insn, HOST_WIDE_INT address, rtx *loc,
   /* If an entry requires 8-byte alignment then assume all constant pools
      require 4 bytes of padding.  Trying to do this later on a per-pool
      basis is awkward because existing pool entries have to be modified.  */
-  if (ARM_DOUBLEWORD_ALIGN && fix->fix_size == 8)
+  if (ARM_DOUBLEWORD_ALIGN && fix->fix_size >= 8)
     minipool_pad = 4;
 
   if (dump_file)
@@ -9655,6 +10139,82 @@ output_move_double (rtx *operands)
   return "";
 }
 
+/* Output a move, load or store for quad-word vectors in ARM registers.  Only
+   handles MEMs accepted by neon_vector_mem_operand with CORE=true.  */
+
+const char *
+output_move_quad (rtx *operands)
+{
+  if (REG_P (operands[0]))
+    {
+      /* Load, or reg->reg move.  */
+
+      if (MEM_P (operands[1]))
+        {
+          switch (GET_CODE (XEXP (operands[1], 0)))
+            {
+            case REG:
+              output_asm_insn ("ldm%(ia%)\t%m1, %M0", operands);
+              break;
+
+            case LABEL_REF:
+            case CONST:
+              output_asm_insn ("adr%?\t%0, %1", operands);
+              output_asm_insn ("ldm%(ia%)\t%0, %M0", operands);
+              break;
+
+            default:
+              gcc_unreachable ();
+            }
+        }
+      else
+        {
+          rtx ops[2];
+          int dest, src, i;
+
+          gcc_assert (REG_P (operands[1]));
+
+          dest = REGNO (operands[0]);
+          src = REGNO (operands[1]);
+
+          /* This seems pretty dumb, but hopefully GCC won't try to do it
+             very often.  */
+          if (dest < src)
+            for (i = 0; i < 4; i++)
+              {
+                ops[0] = gen_rtx_REG (SImode, dest + i);
+                ops[1] = gen_rtx_REG (SImode, src + i);
+                output_asm_insn ("mov%?\t%0, %1", ops);
+              }
+          else
+            for (i = 3; i >= 0; i--)
+              {
+                ops[0] = gen_rtx_REG (SImode, dest + i);
+                ops[1] = gen_rtx_REG (SImode, src + i);
+                output_asm_insn ("mov%?\t%0, %1", ops);
+              }
+        }
+    }
+  else
+    {
+      gcc_assert (MEM_P (operands[0]));
+      gcc_assert (REG_P (operands[1]));
+      gcc_assert (!reg_overlap_mentioned_p (operands[1], operands[0]));
+
+      switch (GET_CODE (XEXP (operands[0], 0)))
+        {
+        case REG:
+          output_asm_insn ("stm%(ia%)\t%m0, %M1", operands);
+          break;
+
+        default:
+          gcc_unreachable ();
+        }
+    }
+
+  return "";
+}
+
 /* Output a VFP load or store instruction.  */
 
 const char *
@@ -9666,16 +10226,20 @@ output_move_vfp (rtx *operands)
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *template;
   char buff[50];
+  enum machine_mode mode;
 
   reg = operands[!load];
   mem = operands[load];
 
+  mode = GET_MODE (reg);
+
   gcc_assert (REG_P (reg));
   gcc_assert (IS_VFP_REGNUM (REGNO (reg)));
-  gcc_assert (GET_MODE (reg) == SFmode
-	      || GET_MODE (reg) == DFmode
-	      || GET_MODE (reg) == SImode
-	      || GET_MODE (reg) == DImode);
+  gcc_assert (mode == SFmode
+	      || mode == DFmode
+	      || mode == SImode
+	      || mode == DImode
+              || (TARGET_NEON && VALID_NEON_DREG_MODE (mode)));
   gcc_assert (MEM_P (mem));
 
   addr = XEXP (mem, 0);
@@ -9711,6 +10275,118 @@ output_move_vfp (rtx *operands)
   return "";
 }
 
+/* Output a Neon quad-word load or store, or a load or store for
+   larger structure modes. We could also support post-modify forms using
+   VLD1/VST1 (for the vectorizer, and perhaps otherwise), but we don't do that
+   yet.
+   WARNING: The ordering of elements in memory is weird in big-endian mode,
+   because we use VSTM instead of VST1, to make it easy to make vector stores
+   via ARM registers write values in the same order as stores direct from Neon
+   registers.  For example, the byte ordering of a quadword vector with 16-byte
+   elements like this:
+
+     [e7:e6:e5:e4:e3:e2:e1:e0]  (highest-numbered element first)
+
+   will be (with lowest address first, h = most-significant byte,
+   l = least-significant byte of element):
+
+     [e3h, e3l, e2h, e2l, e1h, e1l, e0h, e0l,
+      e7h, e7l, e6h, e6l, e5h, e5l, e4h, e4l]
+
+   When necessary, quadword registers (dN, dN+1) are moved to ARM registers from
+   rN in the order:
+
+     dN -> (rN+1, rN), dN+1 -> (rN+3, rN+2)
+
+   So that STM/LDM can be used on vectors in ARM registers, and the same memory
+   layout will result as if VSTM/VLDM were used.  */
+
+const char *
+output_move_neon (rtx *operands)
+{
+  rtx reg, mem, addr, ops[2];
+  int regno, load = REG_P (operands[0]);
+  const char *template;
+  char buff[50];
+  enum machine_mode mode;
+
+  reg = operands[!load];
+  mem = operands[load];
+
+  mode = GET_MODE (reg);
+
+  gcc_assert (REG_P (reg));
+  regno = REGNO (reg);
+  gcc_assert (VFP_REGNO_OK_FOR_DOUBLE (regno)
+	      || NEON_REGNO_OK_FOR_QUAD (regno));
+  gcc_assert (VALID_NEON_DREG_MODE (mode)
+	      || VALID_NEON_QREG_MODE (mode)
+	      || VALID_NEON_STRUCT_MODE (mode));
+  gcc_assert (MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+
+  /* Strip off const from addresses like (const (plus (...))).  */
+  if (GET_CODE (addr) == CONST && GET_CODE (XEXP (addr, 0)) == PLUS)
+    addr = XEXP (addr, 0);
+
+  switch (GET_CODE (addr))
+    {
+    case POST_INC:
+      template = "v%smia%%?\t%%0!, %%h1";
+      ops[0] = XEXP (addr, 0);
+      ops[1] = reg;
+      break;
+
+    case POST_MODIFY:
+      /* FIXME: Not currently enabled in neon_vector_mem_operand.  */
+      gcc_unreachable ();
+
+    case LABEL_REF:
+    case PLUS:
+      {
+	int nregs = HARD_REGNO_NREGS (REGNO (reg), mode) / 2;
+	int i;
+	int overlap = -1;
+	for (i = 0; i < nregs; i++)
+	  {
+	    /* We're only using DImode here because it's a convenient size.  */
+	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
+	    ops[1] = adjust_address (mem, SImode, 8 * i);
+	    if (reg_overlap_mentioned_p (ops[0], mem))
+	      {
+		gcc_assert (overlap == -1);
+		overlap = i;
+	      }
+	    else
+	      {
+		sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
+		output_asm_insn (buff, ops);
+	      }
+	  }
+	if (overlap != -1)
+	  {
+	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
+	    ops[1] = adjust_address (mem, SImode, 8 * overlap);
+	    sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
+	    output_asm_insn (buff, ops);
+	  }
+
+        return "";
+      }
+
+    default:
+      template = "v%smia%%?\t%%m0, %%h1";
+      ops[0] = mem;
+      ops[1] = reg;
+    }
+
+  sprintf (buff, template, load ? "ld" : "st");
+  output_asm_insn (buff, ops);
+
+  return "";
+}
+
 /* Output an ADD r, s, #n where n may be too big for one instruction.
    If adding zero to one register, output nothing.  */
 const char *
@@ -11941,6 +12617,13 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	fputc('s', stream);
       break;
 
+    /* %# is a "break" sequence. It doesn't output anything, but is used to
+       seperate e.g. operand numbers from following text, if that text consists
+       of further digits which we don't want to be part of the operand
+       number.  */
+    case '#':
+      return;
+
     case 'N':
       {
 	REAL_VALUE_TYPE r;
@@ -11950,6 +12633,12 @@ arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
+    /* An integer without a preceding # sign.  */
+    case 'c':
+      gcc_assert (GET_CODE (x) == CONST_INT);
+      fprintf (stream, HOST_WIDE_INT_PRINT_DEC, INTVAL (x));
+      return;
+
     case 'B':
       if (GET_CODE (x) == CONST_INT)
 	{
@@ -12068,6 +12757,26 @@ arm_print_operand (FILE *stream, rtx x, int code)
       asm_fprintf (stream, "%r", REGNO (x) + 1);
       return;
 
+    case 'J':
+      if (GET_CODE (x) != REG || REGNO (x) > LAST_ARM_REGNUM)
+	{
+	  output_operand_lossage ("invalid operand for code '%c'", code);
+	  return;
+	}
+
+      asm_fprintf (stream, "%r", REGNO (x) + (WORDS_BIG_ENDIAN ? 3 : 2));
+      return;
+
+    case 'K':
+      if (GET_CODE (x) != REG || REGNO (x) > LAST_ARM_REGNUM)
+	{
+	  output_operand_lossage ("invalid operand for code '%c'", code);
+	  return;
+	}
+
+      asm_fprintf (stream, "%r", REGNO (x) + (WORDS_BIG_ENDIAN ? 2 : 3));
+      return;
+
     case 'm':
       asm_fprintf (stream, "%r",
 		   GET_CODE (XEXP (x, 0)) == REG
@@ -12080,6 +12789,19 @@ arm_print_operand (FILE *stream, rtx x, int code)
 		   REGNO (x) + ARM_NUM_REGS (GET_MODE (x)) - 1);
       return;
 
+    /* Like 'M', but writing doubleword vector registers, for use by Neon
+       insns.  */
+    case 'h':
+      {
+        int regno = (REGNO (x) - FIRST_VFP_REGNUM) / 2;
+        int numregs = ARM_NUM_REGS (GET_MODE (x)) / 2;
+        if (numregs == 1)
+          asm_fprintf (stream, "{d%d}", regno);
+        else
+          asm_fprintf (stream, "{d%d-d%d}", regno, regno + numregs - 1);
+      }
+      return;
+
     case 'd':
       /* CONST_TRUE_RTX means always -- that's the default.  */
       if (x == const_true_rtx)
@@ -12192,13 +12914,15 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	}
       return;
 
-      /* Print a VFP double precision register name.  */
+    /* Print a VFP/Neon double precision or quad precision register name.  */
     case 'P':
+    case 'q':
       {
 	int mode = GET_MODE (x);
-	int num;
+	int is_quad = (code == 'q');
+	int regno;
 
-	if (mode != DImode && mode != DFmode)
+	if (GET_MODE_SIZE (mode) != (is_quad ? 16 : 8))
 	  {
 	    output_operand_lossage ("invalid operand for code '%c'", code);
 	    return;
@@ -12211,14 +12935,48 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	    return;
 	  }
 
-	num = REGNO(x) - FIRST_VFP_REGNUM;
-	if (num & 1)
+	regno = REGNO (x);
+	if ((is_quad && !NEON_REGNO_OK_FOR_QUAD (regno))
+            || (!is_quad && !VFP_REGNO_OK_FOR_DOUBLE (regno)))
 	  {
 	    output_operand_lossage ("invalid operand for code '%c'", code);
 	    return;
 	  }
 
-	fprintf (stream, "d%d", num >> 1);
+	fprintf (stream, "%c%d", is_quad ? 'q' : 'd',
+	  (regno - FIRST_VFP_REGNUM) >> (is_quad ? 2 : 1));
+      }
+      return;
+
+    /* These two codes print the low/high doubleword register of a Neon quad
+       register, respectively.  For pair-structure types, can also print
+       low/high quadword registers.  */
+    case 'e':
+    case 'f':
+      {
+        int mode = GET_MODE (x);
+        int regno;
+
+        if ((GET_MODE_SIZE (mode) != 16
+	     && GET_MODE_SIZE (mode) != 32) || GET_CODE (x) != REG)
+          {
+	    output_operand_lossage ("invalid operand for code '%c'", code);
+	    return;
+          }
+
+        regno = REGNO (x);
+        if (!NEON_REGNO_OK_FOR_QUAD (regno))
+          {
+	    output_operand_lossage ("invalid operand for code '%c'", code);
+	    return;
+          }
+
+        if (GET_MODE_SIZE (mode) == 16)
+          fprintf (stream, "d%d", ((regno - FIRST_VFP_REGNUM) >> 1)
+				  + (code == 'f' ? 1 : 0));
+        else
+          fprintf (stream, "q%d", ((regno - FIRST_VFP_REGNUM) >> 2)
+				  + (code == 'f' ? 1 : 0));
       }
       return;
 
@@ -12232,6 +12990,47 @@ arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
+    /* Print bits representing opcode features for Neon.
+
+       Bit 0 is 1 for signed, 0 for unsigned.  Floats count as signed
+       and polynomials as unsigned.
+
+       Bit 1 is 1 for floats and polynomials, 0 for ordinary integers.
+
+       Bit 2 is 1 for rounding functions, 0 otherwise.  */
+
+    /* Identify the type as 's', 'u', 'p' or 'f'.  */
+    case 'T':
+      {
+        HOST_WIDE_INT bits = INTVAL (x);
+        fputc ("uspf"[bits & 3], stream);
+      }
+      return;
+
+    /* Likewise, but signed and unsigned integers are both 'i'.  */
+    case 'F':
+      {
+        HOST_WIDE_INT bits = INTVAL (x);
+        fputc ("iipf"[bits & 3], stream);
+      }
+      return;
+
+    /* As for 'T', but emit 'u' instead of 'p'.  */
+    case 't':
+      {
+        HOST_WIDE_INT bits = INTVAL (x);
+        fputc ("usuf"[bits & 3], stream);
+      }
+      return;
+
+    /* Bit 2: rounding (vs none).  */
+    case 'O':
+      {
+        HOST_WIDE_INT bits = INTVAL (x);
+        fputs ((bits & 4) != 0 ? "r" : "", stream);
+      }
+      return;
+
     default:
       if (x == 0)
 	{
@@ -12251,7 +13050,15 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	  break;
 
 	case CONST_DOUBLE:
-	  fprintf (stream, "#%s", fp_immediate_constant (x));
+          if (TARGET_NEON)
+            {
+              char fpstr[20];
+              real_to_decimal (fpstr, CONST_DOUBLE_REAL_VALUE (x),
+			       sizeof (fpstr), 0, 1);
+              fprintf (stream, "#%s", fpstr);
+            }
+          else
+	    fprintf (stream, "#%s", fp_immediate_constant (x));
 	  break;
 
 	default:
@@ -12269,6 +13076,8 @@ arm_print_operand (FILE *stream, rtx x, int code)
 static bool
 arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
 {
+  enum machine_mode mode;
+
   if (size == UNITS_PER_WORD && aligned_p)
     {
       fputs ("\t.word\t", asm_out_file);
@@ -12291,31 +13100,48 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
       return true;
     }
 
-  if (arm_vector_mode_supported_p (GET_MODE (x)))
+  mode = GET_MODE (x);
+
+  if (arm_vector_mode_supported_p (mode))
     {
       int i, units;
+      unsigned int invmask = 0, parts_per_word;
 
       gcc_assert (GET_CODE (x) == CONST_VECTOR);
 
       units = CONST_VECTOR_NUNITS (x);
+      size = GET_MODE_SIZE (GET_MODE_INNER (mode));
 
-      switch (GET_MODE (x))
-	{
-	case V2SImode: size = 4; break;
-	case V4HImode: size = 2; break;
-	case V8QImode: size = 1; break;
-	default:
-	  gcc_unreachable ();
-	}
+      /* For big-endian Neon vectors, we must permute the vector to the form
+         which, when loaded by a VLDR or VLDM instruction, will give a vector
+         with the elements in the right order.  */
+      if (TARGET_NEON && WORDS_BIG_ENDIAN)
+        {
+          parts_per_word = UNITS_PER_WORD / size;
+          /* FIXME: This might be wrong for 64-bit vector elements, but we don't
+             support those anywhere yet.  */
+          invmask = (parts_per_word == 0) ? 0 : (1 << (parts_per_word - 1)) - 1;
+        }
 
-      for (i = 0; i < units; i++)
-	{
-	  rtx elt;
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+        for (i = 0; i < units; i++)
+	  {
+	    rtx elt = CONST_VECTOR_ELT (x, i ^ invmask);
+	    assemble_integer
+	      (elt, size, i == 0 ? BIGGEST_ALIGNMENT : size * BITS_PER_UNIT, 1);
+	  }
+      else
+        for (i = 0; i < units; i++)
+          {
+            rtx elt = CONST_VECTOR_ELT (x, i);
+            REAL_VALUE_TYPE rval;
 
-	  elt = CONST_VECTOR_ELT (x, i);
-	  assemble_integer
-	    (elt, size, i == 0 ? BIGGEST_ALIGNMENT : size * BITS_PER_UNIT, 1);
-	}
+            REAL_VALUE_FROM_CONST_DOUBLE (rval, elt);
+
+            assemble_real
+              (rval, GET_MODE_INNER (mode),
+              i == 0 ? BIGGEST_ALIGNMENT : size * BITS_PER_UNIT);
+          }
 
       return true;
     }
@@ -13015,6 +13841,17 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode)
 
       if (mode == DFmode)
 	return VFP_REGNO_OK_FOR_DOUBLE (regno);
+
+      if (TARGET_NEON)
+        return (VALID_NEON_DREG_MODE (mode) && VFP_REGNO_OK_FOR_DOUBLE (regno))
+               || (VALID_NEON_QREG_MODE (mode)
+                   && NEON_REGNO_OK_FOR_QUAD (regno))
+	       || (mode == TImode && NEON_REGNO_OK_FOR_NREGS (regno, 2))
+	       || (mode == EImode && NEON_REGNO_OK_FOR_NREGS (regno, 3))
+	       || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
+	       || (mode == CImode && NEON_REGNO_OK_FOR_NREGS (regno, 6))
+	       || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8));
+
       return FALSE;
     }
 
@@ -13029,9 +13866,11 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode)
   
   /* We allow any value to be stored in the general registers.
      Restrict doubleword quantities to even register pairs so that we can
-     use ldrd.  */
+     use ldrd.  Do not allow Neon structure opaque modes in general registers;
+     they would use too many.  */
   if (regno <= LAST_ARM_REGNUM)
-    return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
+    return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0)
+      && !VALID_NEON_STRUCT_MODE (mode);
 
   if (regno == FRAME_POINTER_REGNUM
       || regno == ARG_POINTER_REGNUM)
@@ -13273,21 +14112,21 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN2 (iwmmxt_wpackwus, WPACKWUS)
   IWMMXT_BUILTIN2 (iwmmxt_wpackdus, WPACKDUS)
   IWMMXT_BUILTIN2 (ashlv4hi3_di,    WSLLH)
-  IWMMXT_BUILTIN2 (ashlv4hi3,       WSLLHI)
+  IWMMXT_BUILTIN2 (ashlv4hi3_iwmmxt, WSLLHI)
   IWMMXT_BUILTIN2 (ashlv2si3_di,    WSLLW)
-  IWMMXT_BUILTIN2 (ashlv2si3,       WSLLWI)
+  IWMMXT_BUILTIN2 (ashlv2si3_iwmmxt, WSLLWI)
   IWMMXT_BUILTIN2 (ashldi3_di,      WSLLD)
   IWMMXT_BUILTIN2 (ashldi3_iwmmxt,  WSLLDI)
   IWMMXT_BUILTIN2 (lshrv4hi3_di,    WSRLH)
-  IWMMXT_BUILTIN2 (lshrv4hi3,       WSRLHI)
+  IWMMXT_BUILTIN2 (lshrv4hi3_iwmmxt, WSRLHI)
   IWMMXT_BUILTIN2 (lshrv2si3_di,    WSRLW)
-  IWMMXT_BUILTIN2 (lshrv2si3,       WSRLWI)
+  IWMMXT_BUILTIN2 (lshrv2si3_iwmmxt, WSRLWI)
   IWMMXT_BUILTIN2 (lshrdi3_di,      WSRLD)
   IWMMXT_BUILTIN2 (lshrdi3_iwmmxt,  WSRLDI)
   IWMMXT_BUILTIN2 (ashrv4hi3_di,    WSRAH)
-  IWMMXT_BUILTIN2 (ashrv4hi3,       WSRAHI)
+  IWMMXT_BUILTIN2 (ashrv4hi3_iwmmxt, WSRAHI)
   IWMMXT_BUILTIN2 (ashrv2si3_di,    WSRAW)
-  IWMMXT_BUILTIN2 (ashrv2si3,       WSRAWI)
+  IWMMXT_BUILTIN2 (ashrv2si3_iwmmxt, WSRAWI)
   IWMMXT_BUILTIN2 (ashrdi3_di,      WSRAD)
   IWMMXT_BUILTIN2 (ashrdi3_iwmmxt,  WSRADI)
   IWMMXT_BUILTIN2 (rorv4hi3_di,     WRORH)
@@ -13661,6 +14500,766 @@ arm_init_tls_builtins (void)
 			NULL, const_nothrow);
 }
 
+typedef enum {
+  T_V8QI  = 0x0001,
+  T_V4HI  = 0x0002,
+  T_V2SI  = 0x0004,
+  T_V2SF  = 0x0008,
+  T_DI    = 0x0010,
+  T_V16QI = 0x0020,
+  T_V8HI  = 0x0040,
+  T_V4SI  = 0x0080,
+  T_V4SF  = 0x0100,
+  T_V2DI  = 0x0200,
+  T_TI	  = 0x0400,
+  T_EI	  = 0x0800,
+  T_OI	  = 0x1000
+} neon_builtin_type_bits;
+
+#define v8qi_UP  T_V8QI
+#define v4hi_UP  T_V4HI
+#define v2si_UP  T_V2SI
+#define v2sf_UP  T_V2SF
+#define di_UP    T_DI
+#define v16qi_UP T_V16QI
+#define v8hi_UP  T_V8HI
+#define v4si_UP  T_V4SI
+#define v4sf_UP  T_V4SF
+#define v2di_UP  T_V2DI
+#define ti_UP	 T_TI
+#define ei_UP	 T_EI
+#define oi_UP	 T_OI
+
+#define UP(X) X##_UP
+
+#define T_MAX 13
+
+typedef enum {
+  NEON_BINOP,
+  NEON_TERNOP,
+  NEON_UNOP,
+  NEON_GETLANE,
+  NEON_SETLANE,
+  NEON_CREATE,
+  NEON_DUP,
+  NEON_DUPLANE,
+  NEON_COMBINE,
+  NEON_SPLIT,
+  NEON_LANEMUL,
+  NEON_LANEMULL,
+  NEON_LANEMULH,
+  NEON_LANEMAC,
+  NEON_SCALARMUL,
+  NEON_SCALARMULL,
+  NEON_SCALARMULH,
+  NEON_SCALARMAC,
+  NEON_CONVERT,
+  NEON_FIXCONV,
+  NEON_SELECT,
+  NEON_RESULTPAIR,
+  NEON_REINTERP,
+  NEON_VTBL,
+  NEON_VTBX,
+  NEON_LOAD1,
+  NEON_LOAD1LANE,
+  NEON_STORE1,
+  NEON_STORE1LANE,
+  NEON_LOADSTRUCT,
+  NEON_LOADSTRUCTLANE,
+  NEON_STORESTRUCT,
+  NEON_STORESTRUCTLANE,
+  NEON_LOGICBINOP,
+  NEON_SHIFTINSERT,
+  NEON_SHIFTIMM,
+  NEON_SHIFTACC
+} neon_itype;
+
+typedef struct {
+  const char *name;
+  const neon_itype itype;
+  const neon_builtin_type_bits bits;
+  const enum insn_code codes[T_MAX];
+  const unsigned int num_vars;
+  unsigned int base_fcode;
+} neon_builtin_datum;
+
+#define CF(N,X) CODE_FOR_neon_##N##X
+
+#define VAR1(T, N, A) \
+  #N, NEON_##T, UP (A), { CF (N, A) }, 1, 0
+#define VAR2(T, N, A, B) \
+  #N, NEON_##T, UP (A) | UP (B), { CF (N, A), CF (N, B) }, 2, 0
+#define VAR3(T, N, A, B, C) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C), \
+  { CF (N, A), CF (N, B), CF (N, C) }, 3, 0
+#define VAR4(T, N, A, B, C, D) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D) }, 4, 0
+#define VAR5(T, N, A, B, C, D, E) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E) }, 5, 0
+#define VAR6(T, N, A, B, C, D, E, F) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E) | UP (F), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E), CF (N, F) }, 6, 0
+#define VAR7(T, N, A, B, C, D, E, F, G) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E) | UP (F) | UP (G), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E), CF (N, F), \
+    CF (N, G) }, 7, 0
+#define VAR8(T, N, A, B, C, D, E, F, G, H) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E) | UP (F) | UP (G) \
+                | UP (H), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E), CF (N, F), \
+    CF (N, G), CF (N, H) }, 8, 0
+#define VAR9(T, N, A, B, C, D, E, F, G, H, I) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E) | UP (F) | UP (G) \
+                | UP (H) | UP (I), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E), CF (N, F), \
+    CF (N, G), CF (N, H), CF (N, I) }, 9, 0
+#define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
+  #N, NEON_##T, UP (A) | UP (B) | UP (C) | UP (D) | UP (E) | UP (F) | UP (G) \
+                | UP (H) | UP (I) | UP (J), \
+  { CF (N, A), CF (N, B), CF (N, C), CF (N, D), CF (N, E), CF (N, F), \
+    CF (N, G), CF (N, H), CF (N, I), CF (N, J) }, 10, 0
+
+/* The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic, which usually refers to the last vector operand.
+   (Signed/unsigned/polynomial types are not differentiated between though, and
+   are all mapped onto the same mode for a given element size.) The modes
+   listed per instruction should be the same as those defined for that
+   instruction's pattern in neon.md.
+   WARNING: Variants should be listed in the same increasing order as
+   neon_builtin_type_bits.  */
+
+static neon_builtin_datum neon_builtin_data[] =
+{
+  { VAR10 (BINOP, vadd,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR3 (BINOP, vaddl, v8qi, v4hi, v2si) },
+  { VAR3 (BINOP, vaddw, v8qi, v4hi, v2si) },
+  { VAR6 (BINOP, vhadd, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR8 (BINOP, vqadd, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR3 (BINOP, vaddhn, v8hi, v4si, v2di) },
+  { VAR8 (BINOP, vmul, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR3 (TERNOP, vmlal, v8qi, v4hi, v2si) },
+  { VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR3 (TERNOP, vmlsl, v8qi, v4hi, v2si) },
+  { VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si) },
+  { VAR2 (TERNOP, vqdmlal, v4hi, v2si) },
+  { VAR2 (TERNOP, vqdmlsl, v4hi, v2si) },
+  { VAR3 (BINOP, vmull, v8qi, v4hi, v2si) },
+  { VAR2 (SCALARMULL, vmull_n, v4hi, v2si) },
+  { VAR2 (LANEMULL, vmull_lane, v4hi, v2si) },
+  { VAR2 (SCALARMULL, vqdmull_n, v4hi, v2si) },
+  { VAR2 (LANEMULL, vqdmull_lane, v4hi, v2si) },
+  { VAR4 (SCALARMULH, vqdmulh_n, v4hi, v2si, v8hi, v4si) },
+  { VAR4 (LANEMULH, vqdmulh_lane, v4hi, v2si, v8hi, v4si) },
+  { VAR2 (BINOP, vqdmull, v4hi, v2si) },
+  { VAR8 (BINOP, vshl, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (BINOP, vqshl, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (SHIFTIMM, vshr_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR3 (SHIFTIMM, vshrn_n, v8hi, v4si, v2di) },
+  { VAR3 (SHIFTIMM, vqshrn_n, v8hi, v4si, v2di) },
+  { VAR3 (SHIFTIMM, vqshrun_n, v8hi, v4si, v2di) },
+  { VAR8 (SHIFTIMM, vshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (SHIFTIMM, vqshl_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (SHIFTIMM, vqshlu_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR3 (SHIFTIMM, vshll_n, v8qi, v4hi, v2si) },
+  { VAR8 (SHIFTACC, vsra_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR10 (BINOP, vsub,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR3 (BINOP, vsubl, v8qi, v4hi, v2si) },
+  { VAR3 (BINOP, vsubw, v8qi, v4hi, v2si) },
+  { VAR8 (BINOP, vqsub, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR6 (BINOP, vhsub, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR3 (BINOP, vsubhn, v8hi, v4si, v2di) },
+  { VAR8 (BINOP, vceq, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (BINOP, vcge, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR2 (BINOP, vcage, v2sf, v4sf) },
+  { VAR2 (BINOP, vcagt, v2sf, v4sf) },
+  { VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR8 (BINOP, vabd, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR3 (BINOP, vabdl, v8qi, v4hi, v2si) },
+  { VAR6 (TERNOP, vaba, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR3 (TERNOP, vabal, v8qi, v4hi, v2si) },
+  { VAR8 (BINOP, vmax, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (BINOP, vmin, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf) },
+  { VAR6 (UNOP, vpaddl, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR6 (BINOP, vpadal, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR4 (BINOP, vpmax, v8qi, v4hi, v2si, v2sf) },
+  { VAR4 (BINOP, vpmin, v8qi, v4hi, v2si, v2sf) },
+  { VAR2 (BINOP, vrecps, v2sf, v4sf) },
+  { VAR2 (BINOP, vrsqrts, v2sf, v4sf) },
+  { VAR8 (SHIFTINSERT, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (SHIFTINSERT, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di) },
+  { VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR6 (UNOP, vcls, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  { VAR2 (UNOP, vcnt, v8qi, v16qi) },
+  { VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf) },
+  { VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf) },
+  { VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si) },
+  /* FIXME: vget_lane supports more variants than this!  */
+  { VAR10 (GETLANE, vget_lane,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (SETLANE, vset_lane,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (CREATE, vcreate, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR10 (DUP, vdup_n,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (DUPLANE, vdup_lane,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (COMBINE, vcombine, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (SPLIT, vget_high, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (SPLIT, vget_low, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR3 (UNOP, vmovn, v8hi, v4si, v2di) },
+  { VAR3 (UNOP, vqmovn, v8hi, v4si, v2di) },
+  { VAR3 (UNOP, vqmovun, v8hi, v4si, v2di) },
+  { VAR3 (UNOP, vmovl, v8qi, v4hi, v2si) },
+  { VAR6 (LANEMUL, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR6 (LANEMAC, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR2 (LANEMAC, vmlal_lane, v4hi, v2si) },
+  { VAR2 (LANEMAC, vqdmlal_lane, v4hi, v2si) },
+  { VAR6 (LANEMAC, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR2 (LANEMAC, vmlsl_lane, v4hi, v2si) },
+  { VAR2 (LANEMAC, vqdmlsl_lane, v4hi, v2si) },
+  { VAR6 (SCALARMUL, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR6 (SCALARMAC, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR2 (SCALARMAC, vmlal_n, v4hi, v2si) },
+  { VAR2 (SCALARMAC, vqdmlal_n, v4hi, v2si) },
+  { VAR6 (SCALARMAC, vmls_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR2 (SCALARMAC, vmlsl_n, v4hi, v2si) },
+  { VAR2 (SCALARMAC, vqdmlsl_n, v4hi, v2si) },
+  { VAR10 (BINOP, vext,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi) },
+  { VAR2 (UNOP, vrev16, v8qi, v16qi) },
+  { VAR4 (CONVERT, vcvt, v2si, v2sf, v4si, v4sf) },
+  { VAR4 (FIXCONV, vcvt_n, v2si, v2sf, v4si, v4sf) },
+  { VAR10 (SELECT, vbsl,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR1 (VTBL, vtbl1, v8qi) },
+  { VAR1 (VTBL, vtbl2, v8qi) },
+  { VAR1 (VTBL, vtbl3, v8qi) },
+  { VAR1 (VTBL, vtbl4, v8qi) },
+  { VAR1 (VTBX, vtbx1, v8qi) },
+  { VAR1 (VTBX, vtbx2, v8qi) },
+  { VAR1 (VTBX, vtbx3, v8qi) },
+  { VAR1 (VTBX, vtbx4, v8qi) },
+  { VAR8 (RESULTPAIR, vtrn, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (RESULTPAIR, vzip, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR8 (RESULTPAIR, vuzp, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf) },
+  { VAR5 (REINTERP, vreinterpretv8qi, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (REINTERP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (REINTERP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (REINTERP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (REINTERP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR5 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR5 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOAD1, vld1,
+           v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOAD1LANE, vld1_lane,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOAD1, vld1_dup,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (STORE1, vst1,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (STORE1LANE, vst1_lane,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR9 (LOADSTRUCT,
+	  vld2, v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (LOADSTRUCTLANE, vld2_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR5 (LOADSTRUCT, vld2_dup, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR9 (STORESTRUCT, vst2,
+	  v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (STORESTRUCTLANE, vst2_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR9 (LOADSTRUCT,
+	  vld3, v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (LOADSTRUCTLANE, vld3_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR5 (LOADSTRUCT, vld3_dup, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR9 (STORESTRUCT, vst3,
+	  v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (STORESTRUCTLANE, vst3_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR9 (LOADSTRUCT, vld4,
+	  v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (LOADSTRUCTLANE, vld4_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR5 (LOADSTRUCT, vld4_dup, v8qi, v4hi, v2si, v2sf, di) },
+  { VAR9 (STORESTRUCT, vst4,
+	  v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf) },
+  { VAR7 (STORESTRUCTLANE, vst4_lane,
+	  v8qi, v4hi, v2si, v2sf, v8hi, v4si, v4sf) },
+  { VAR10 (LOGICBINOP, vand,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOGICBINOP, vorr,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (BINOP, veor,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOGICBINOP, vbic,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) },
+  { VAR10 (LOGICBINOP, vorn,
+	   v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di) }
+};
+
+#undef CF
+#undef VAR1
+#undef VAR2
+#undef VAR3
+#undef VAR4
+#undef VAR5
+#undef VAR6
+#undef VAR7
+#undef VAR8
+#undef VAR9
+#undef VAR10
+
+static void
+arm_init_neon_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_NEON_BASE;
+
+  /* Create distinguished type nodes for NEON vector element types,
+     and pointers to values of such types, so we can detect them later.  */
+  tree neon_intQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
+  tree neon_intHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
+  tree neon_polyQI_type_node = make_signed_type (GET_MODE_PRECISION (QImode));
+  tree neon_polyHI_type_node = make_signed_type (GET_MODE_PRECISION (HImode));
+  tree neon_intSI_type_node = make_signed_type (GET_MODE_PRECISION (SImode));
+  tree neon_intDI_type_node = make_signed_type (GET_MODE_PRECISION (DImode));
+  tree neon_float_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (neon_float_type_node) = FLOAT_TYPE_SIZE;
+  layout_type (neon_float_type_node);
+
+  /* Define typedefs which exactly correspond to the modes we are basing vector
+     types on.  If you change these names you'll need to change
+     the table used by arm_mangle_vector_type too.  */
+  (*lang_hooks.types.register_builtin_type) (neon_intQI_type_node,
+					     "__builtin_neon_qi");
+  (*lang_hooks.types.register_builtin_type) (neon_intHI_type_node,
+					     "__builtin_neon_hi");
+  (*lang_hooks.types.register_builtin_type) (neon_intSI_type_node,
+					     "__builtin_neon_si");
+  (*lang_hooks.types.register_builtin_type) (neon_float_type_node,
+					     "__builtin_neon_sf");
+  (*lang_hooks.types.register_builtin_type) (neon_intDI_type_node,
+					     "__builtin_neon_di");
+
+  (*lang_hooks.types.register_builtin_type) (neon_polyQI_type_node,
+					     "__builtin_neon_poly8");
+  (*lang_hooks.types.register_builtin_type) (neon_polyHI_type_node,
+					     "__builtin_neon_poly16");
+
+  tree intQI_pointer_node = build_pointer_type (neon_intQI_type_node);
+  tree intHI_pointer_node = build_pointer_type (neon_intHI_type_node);
+  tree intSI_pointer_node = build_pointer_type (neon_intSI_type_node);
+  tree intDI_pointer_node = build_pointer_type (neon_intDI_type_node);
+  tree float_pointer_node = build_pointer_type (neon_float_type_node);
+
+  /* Next create constant-qualified versions of the above types.  */
+  tree const_intQI_node = build_qualified_type (neon_intQI_type_node,
+						TYPE_QUAL_CONST);
+  tree const_intHI_node = build_qualified_type (neon_intHI_type_node,
+						TYPE_QUAL_CONST);
+  tree const_intSI_node = build_qualified_type (neon_intSI_type_node,
+						TYPE_QUAL_CONST);
+  tree const_intDI_node = build_qualified_type (neon_intDI_type_node,
+						TYPE_QUAL_CONST);
+  tree const_float_node = build_qualified_type (neon_float_type_node,
+						TYPE_QUAL_CONST);
+
+  tree const_intQI_pointer_node = build_pointer_type (const_intQI_node);
+  tree const_intHI_pointer_node = build_pointer_type (const_intHI_node);
+  tree const_intSI_pointer_node = build_pointer_type (const_intSI_node);
+  tree const_intDI_pointer_node = build_pointer_type (const_intDI_node);
+  tree const_float_pointer_node = build_pointer_type (const_float_node);
+
+  /* Now create vector types based on our NEON element types.  */
+  /* 64-bit vectors.  */
+  tree V8QI_type_node =
+    build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
+  tree V4HI_type_node =
+    build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
+  tree V2SI_type_node =
+    build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
+  tree V2SF_type_node =
+    build_vector_type_for_mode (neon_float_type_node, V2SFmode);
+  /* 128-bit vectors.  */
+  tree V16QI_type_node =
+    build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
+  tree V8HI_type_node =
+    build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
+  tree V4SI_type_node =
+    build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
+  tree V4SF_type_node =
+    build_vector_type_for_mode (neon_float_type_node, V4SFmode);
+  tree V2DI_type_node =
+    build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
+
+  /* Unsigned integer types for various mode sizes.  */
+  tree intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
+  tree intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
+  tree intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
+  tree intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
+
+  (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
+					     "__builtin_neon_uqi");
+  (*lang_hooks.types.register_builtin_type) (intUHI_type_node,
+					     "__builtin_neon_uhi");
+  (*lang_hooks.types.register_builtin_type) (intUSI_type_node,
+					     "__builtin_neon_usi");
+  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
+					     "__builtin_neon_udi");
+
+  /* Opaque integer types for structures of vectors.  */
+  tree intEI_type_node = make_signed_type (GET_MODE_PRECISION (EImode));
+  tree intOI_type_node = make_signed_type (GET_MODE_PRECISION (OImode));
+  tree intCI_type_node = make_signed_type (GET_MODE_PRECISION (CImode));
+  tree intXI_type_node = make_signed_type (GET_MODE_PRECISION (XImode));
+
+  (*lang_hooks.types.register_builtin_type) (intTI_type_node,
+					     "__builtin_neon_ti");
+  (*lang_hooks.types.register_builtin_type) (intEI_type_node,
+					     "__builtin_neon_ei");
+  (*lang_hooks.types.register_builtin_type) (intOI_type_node,
+					     "__builtin_neon_oi");
+  (*lang_hooks.types.register_builtin_type) (intCI_type_node,
+					     "__builtin_neon_ci");
+  (*lang_hooks.types.register_builtin_type) (intXI_type_node,
+					     "__builtin_neon_xi");
+
+  /* Pointers to vector types.  */
+  tree V8QI_pointer_node = build_pointer_type (V8QI_type_node);
+  tree V4HI_pointer_node = build_pointer_type (V4HI_type_node);
+  tree V2SI_pointer_node = build_pointer_type (V2SI_type_node);
+  tree V2SF_pointer_node = build_pointer_type (V2SF_type_node);
+  tree V16QI_pointer_node = build_pointer_type (V16QI_type_node);
+  tree V8HI_pointer_node = build_pointer_type (V8HI_type_node);
+  tree V4SI_pointer_node = build_pointer_type (V4SI_type_node);
+  tree V4SF_pointer_node = build_pointer_type (V4SF_type_node);
+  tree V2DI_pointer_node = build_pointer_type (V2DI_type_node);
+
+  /* Operations which return results as pairs.  */
+  tree void_ftype_pv8qi_v8qi_v8qi =
+    build_function_type_list (void_type_node, V8QI_pointer_node, V8QI_type_node,
+  			      V8QI_type_node, NULL);
+  tree void_ftype_pv4hi_v4hi_v4hi =
+    build_function_type_list (void_type_node, V4HI_pointer_node, V4HI_type_node,
+  			      V4HI_type_node, NULL);
+  tree void_ftype_pv2si_v2si_v2si =
+    build_function_type_list (void_type_node, V2SI_pointer_node, V2SI_type_node,
+  			      V2SI_type_node, NULL);
+  tree void_ftype_pv2sf_v2sf_v2sf =
+    build_function_type_list (void_type_node, V2SF_pointer_node, V2SF_type_node,
+  			      V2SF_type_node, NULL);
+  tree void_ftype_pdi_di_di =
+    build_function_type_list (void_type_node, intDI_pointer_node,
+			      neon_intDI_type_node, neon_intDI_type_node, NULL);
+  tree void_ftype_pv16qi_v16qi_v16qi =
+    build_function_type_list (void_type_node, V16QI_pointer_node,
+			      V16QI_type_node, V16QI_type_node, NULL);
+  tree void_ftype_pv8hi_v8hi_v8hi =
+    build_function_type_list (void_type_node, V8HI_pointer_node, V8HI_type_node,
+  			      V8HI_type_node, NULL);
+  tree void_ftype_pv4si_v4si_v4si =
+    build_function_type_list (void_type_node, V4SI_pointer_node, V4SI_type_node,
+  			      V4SI_type_node, NULL);
+  tree void_ftype_pv4sf_v4sf_v4sf =
+    build_function_type_list (void_type_node, V4SF_pointer_node, V4SF_type_node,
+  			      V4SF_type_node, NULL);
+  tree void_ftype_pv2di_v2di_v2di =
+    build_function_type_list (void_type_node, V2DI_pointer_node, V2DI_type_node,
+			      V2DI_type_node, NULL);
+
+  tree reinterp_ftype_dreg[5][5];
+  tree reinterp_ftype_qreg[5][5];
+  tree dreg_types[5], qreg_types[5];
+
+  dreg_types[0] = V8QI_type_node;
+  dreg_types[1] = V4HI_type_node;
+  dreg_types[2] = V2SI_type_node;
+  dreg_types[3] = V2SF_type_node;
+  dreg_types[4] = neon_intDI_type_node;
+
+  qreg_types[0] = V16QI_type_node;
+  qreg_types[1] = V8HI_type_node;
+  qreg_types[2] = V4SI_type_node;
+  qreg_types[3] = V4SF_type_node;
+  qreg_types[4] = V2DI_type_node;
+
+  for (i = 0; i < 5; i++)
+    {
+      int j;
+      for (j = 0; j < 5; j++)
+        {
+          reinterp_ftype_dreg[i][j]
+            = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
+          reinterp_ftype_qreg[i][j]
+            = build_function_type_list (qreg_types[i], qreg_types[j], NULL);
+        }
+    }
+
+  for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++)
+    {
+      neon_builtin_datum *d = &neon_builtin_data[i];
+      unsigned int j, codeidx = 0;
+
+      d->base_fcode = fcode;
+
+      for (j = 0; j < T_MAX; j++)
+	{
+	  const char* const modenames[] = {
+	    "v8qi", "v4hi", "v2si", "v2sf", "di",
+	    "v16qi", "v8hi", "v4si", "v4sf", "v2di"
+	  };
+	  char namebuf[60];
+	  tree ftype = NULL;
+	  enum insn_code icode;
+	  int is_load = 0, is_store = 0;
+
+          if ((d->bits & (1 << j)) == 0)
+            continue;
+
+          icode = d->codes[codeidx++];
+
+          switch (d->itype)
+            {
+	    case NEON_LOAD1:
+	    case NEON_LOAD1LANE:
+	    case NEON_LOADSTRUCT:
+	    case NEON_LOADSTRUCTLANE:
+	      is_load = 1;
+	      /* Fall through.  */
+	    case NEON_STORE1:
+	    case NEON_STORE1LANE:
+	    case NEON_STORESTRUCT:
+	    case NEON_STORESTRUCTLANE:
+	      if (!is_load)
+	        is_store = 1;
+	      /* Fall through.  */
+            case NEON_UNOP:
+	    case NEON_BINOP:
+	    case NEON_LOGICBINOP:
+	    case NEON_SHIFTINSERT:
+	    case NEON_TERNOP:
+	    case NEON_GETLANE:
+	    case NEON_SETLANE:
+	    case NEON_CREATE:
+	    case NEON_DUP:
+	    case NEON_DUPLANE:
+	    case NEON_SHIFTIMM:
+	    case NEON_SHIFTACC:
+	    case NEON_COMBINE:
+	    case NEON_SPLIT:
+	    case NEON_CONVERT:
+	    case NEON_FIXCONV:
+	    case NEON_LANEMUL:
+	    case NEON_LANEMULL:
+	    case NEON_LANEMULH:
+	    case NEON_LANEMAC:
+	    case NEON_SCALARMUL:
+	    case NEON_SCALARMULL:
+	    case NEON_SCALARMULH:
+	    case NEON_SCALARMAC:
+	    case NEON_SELECT:
+	    case NEON_VTBL:
+	    case NEON_VTBX:
+	      {
+		int k;
+		tree return_type = void_type_node, args = void_list_node;
+
+		/* Build a function type directly from the insn_data for this
+		   builtin.  The build_function_type() function takes care of
+		   removing duplicates for us.  */
+		for (k = insn_data[icode].n_operands - 1; k >= 0; k--)
+		  {
+		    tree eltype;
+
+		    if (is_load && k == 1)
+		      {
+		        /* Neon load patterns always have the memory operand
+			   (a SImode pointer) in the operand 1 position.  We
+			   want a const pointer to the element type in that
+			   position.  */
+		        gcc_assert (insn_data[icode].operand[k].mode == SImode);
+
+			switch (1 << j)
+			  {
+			  case T_V8QI:
+			  case T_V16QI:
+			    eltype = const_intQI_pointer_node;
+			    break;
+
+			  case T_V4HI:
+			  case T_V8HI:
+			    eltype = const_intHI_pointer_node;
+			    break;
+
+			  case T_V2SI:
+			  case T_V4SI:
+			    eltype = const_intSI_pointer_node;
+			    break;
+
+			  case T_V2SF:
+			  case T_V4SF:
+			    eltype = const_float_pointer_node;
+			    break;
+
+			  case T_DI:
+			  case T_V2DI:
+			    eltype = const_intDI_pointer_node;
+			    break;
+
+			  default: gcc_unreachable ();
+			  }
+  		      }
+		    else if (is_store && k == 0)
+		      {
+		        /* Similarly, Neon store patterns use operand 0 as
+			   the memory location to store to (a SImode pointer).
+			   Use a pointer to the element type of the store in
+			   that position.  */
+			gcc_assert (insn_data[icode].operand[k].mode == SImode);
+
+			switch (1 << j)
+			  {
+			  case T_V8QI:
+			  case T_V16QI:
+			    eltype = intQI_pointer_node;
+			    break;
+
+			  case T_V4HI:
+			  case T_V8HI:
+			    eltype = intHI_pointer_node;
+			    break;
+
+			  case T_V2SI:
+			  case T_V4SI:
+			    eltype = intSI_pointer_node;
+			    break;
+
+			  case T_V2SF:
+			  case T_V4SF:
+			    eltype = float_pointer_node;
+			    break;
+
+			  case T_DI:
+			  case T_V2DI:
+			    eltype = intDI_pointer_node;
+			    break;
+
+			  default: gcc_unreachable ();
+			  }
+		      }
+		    else
+		      {
+			switch (insn_data[icode].operand[k].mode)
+	        	  {
+			  case VOIDmode: eltype = void_type_node; break;
+			  /* Scalars.  */
+			  case QImode: eltype = neon_intQI_type_node; break;
+			  case HImode: eltype = neon_intHI_type_node; break;
+			  case SImode: eltype = neon_intSI_type_node; break;
+			  case SFmode: eltype = neon_float_type_node; break;
+			  case DImode: eltype = neon_intDI_type_node; break;
+			  case TImode: eltype = intTI_type_node; break;
+			  case EImode: eltype = intEI_type_node; break;
+			  case OImode: eltype = intOI_type_node; break;
+			  case CImode: eltype = intCI_type_node; break;
+			  case XImode: eltype = intXI_type_node; break;
+			  /* 64-bit vectors.  */
+			  case V8QImode: eltype = V8QI_type_node; break;
+			  case V4HImode: eltype = V4HI_type_node; break;
+			  case V2SImode: eltype = V2SI_type_node; break;
+			  case V2SFmode: eltype = V2SF_type_node; break;
+			  /* 128-bit vectors.  */
+			  case V16QImode: eltype = V16QI_type_node; break;
+			  case V8HImode: eltype = V8HI_type_node; break;
+			  case V4SImode: eltype = V4SI_type_node; break;
+			  case V4SFmode: eltype = V4SF_type_node; break;
+			  case V2DImode: eltype = V2DI_type_node; break;
+			  default: gcc_unreachable ();
+			  }
+		      }
+
+		    if (k == 0 && !is_store)
+	              return_type = eltype;
+		    else
+		      args = tree_cons (NULL_TREE, eltype, args);
+		  }
+
+		ftype = build_function_type (return_type, args);
+	      }
+	      break;
+
+	    case NEON_RESULTPAIR:
+              {
+                switch (insn_data[icode].operand[1].mode)
+                  {
+		  case V8QImode: ftype = void_ftype_pv8qi_v8qi_v8qi; break;
+                  case V4HImode: ftype = void_ftype_pv4hi_v4hi_v4hi; break;
+                  case V2SImode: ftype = void_ftype_pv2si_v2si_v2si; break;
+                  case V2SFmode: ftype = void_ftype_pv2sf_v2sf_v2sf; break;
+                  case DImode: ftype = void_ftype_pdi_di_di; break;
+                  case V16QImode: ftype = void_ftype_pv16qi_v16qi_v16qi; break;
+                  case V8HImode: ftype = void_ftype_pv8hi_v8hi_v8hi; break;
+                  case V4SImode: ftype = void_ftype_pv4si_v4si_v4si; break;
+                  case V4SFmode: ftype = void_ftype_pv4sf_v4sf_v4sf; break;
+                  case V2DImode: ftype = void_ftype_pv2di_v2di_v2di; break;
+                  default: gcc_unreachable ();
+                  }
+              }
+              break;
+
+	    case NEON_REINTERP:
+              {
+                /* We iterate over 5 doubleword types, then 5 quadword
+                   types.  */
+                int rhs = j % 5;
+                switch (insn_data[icode].operand[0].mode)
+                  {
+                  case V8QImode: ftype = reinterp_ftype_dreg[0][rhs]; break;
+                  case V4HImode: ftype = reinterp_ftype_dreg[1][rhs]; break;
+                  case V2SImode: ftype = reinterp_ftype_dreg[2][rhs]; break;
+                  case V2SFmode: ftype = reinterp_ftype_dreg[3][rhs]; break;
+                  case DImode: ftype = reinterp_ftype_dreg[4][rhs]; break;
+                  case V16QImode: ftype = reinterp_ftype_qreg[0][rhs]; break;
+                  case V8HImode: ftype = reinterp_ftype_qreg[1][rhs]; break;
+                  case V4SImode: ftype = reinterp_ftype_qreg[2][rhs]; break;
+		  case V4SFmode: ftype = reinterp_ftype_qreg[3][rhs]; break;
+                  case V2DImode: ftype = reinterp_ftype_qreg[4][rhs]; break;
+                  default: gcc_unreachable ();
+                  }
+              }
+              break;
+
+            default:
+              gcc_unreachable ();
+            }
+
+          gcc_assert (ftype != NULL);
+
+          sprintf (namebuf, "__builtin_neon_%s%s", d->name, modenames[j]);
+
+          add_builtin_function (namebuf, ftype, fcode++, BUILT_IN_MD, NULL,
+				NULL_TREE);
+        }
+    }
+}
+
 static void
 arm_init_builtins (void)
 {
@@ -13668,6 +15267,9 @@ arm_init_builtins (void)
 
   if (TARGET_REALLY_IWMMXT)
     arm_init_iwmmxt_builtins ();
+
+  if (TARGET_NEON)
+    arm_init_neon_builtins ();
 }
 
 /* Errors in the source file can cause expand_expr to return const0_rtx
@@ -13759,6 +15361,343 @@ arm_expand_unop_builtin (enum insn_code icode,
   return target;
 }
 
+static int
+neon_builtin_compare (const void *a, const void *b)
+{
+  const neon_builtin_datum *key = a;
+  const neon_builtin_datum *memb = b;
+  unsigned int soughtcode = key->base_fcode;
+
+  if (soughtcode >= memb->base_fcode
+      && soughtcode < memb->base_fcode + memb->num_vars)
+    return 0;
+  else if (soughtcode < memb->base_fcode)
+    return -1;
+  else
+    return 1;
+}
+
+static enum insn_code
+locate_neon_builtin_icode (int fcode, neon_itype *itype)
+{
+  neon_builtin_datum key, *found;
+  int idx;
+
+  key.base_fcode = fcode;
+  found = bsearch (&key, &neon_builtin_data[0], ARRAY_SIZE (neon_builtin_data),
+		   sizeof (neon_builtin_data[0]), neon_builtin_compare);
+  gcc_assert (found);
+  idx = fcode - (int) found->base_fcode;
+  gcc_assert (idx >= 0 && idx < T_MAX && idx < (int)found->num_vars);
+
+  if (itype)
+    *itype = found->itype;
+
+  return found->codes[idx];
+}
+
+typedef enum {
+  NEON_ARG_COPY_TO_REG,
+  NEON_ARG_CONSTANT,
+  NEON_ARG_STOP
+} builtin_arg;
+
+#define NEON_MAX_BUILTIN_ARGS 5
+
+/* Expand a Neon builtin.  */
+static rtx
+arm_expand_neon_args (rtx target, int icode, int have_retval,
+		      tree exp, ...)
+{
+  va_list ap;
+  rtx pat;
+  tree arg[NEON_MAX_BUILTIN_ARGS];
+  rtx op[NEON_MAX_BUILTIN_ARGS];
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode[NEON_MAX_BUILTIN_ARGS];
+  int argc = 0;
+
+  if (have_retval
+      && (!target
+	  || GET_MODE (target) != tmode
+	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
+    target = gen_reg_rtx (tmode);
+
+  va_start (ap, exp);
+
+  for (;;)
+    {
+      builtin_arg thisarg = va_arg (ap, int);
+
+      if (thisarg == NEON_ARG_STOP)
+        break;
+      else
+        {
+          arg[argc] = CALL_EXPR_ARG (exp, argc);
+          op[argc] = expand_normal (arg[argc]);
+          mode[argc] = insn_data[icode].operand[argc + have_retval].mode;
+
+          switch (thisarg)
+            {
+            case NEON_ARG_COPY_TO_REG:
+              /*gcc_assert (GET_MODE (op[argc]) == mode[argc]);*/
+              if (!(*insn_data[icode].operand[argc + have_retval].predicate)
+                     (op[argc], mode[argc]))
+                op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
+              break;
+
+            case NEON_ARG_CONSTANT:
+              /* FIXME: This error message is somewhat unhelpful.  */
+              if (!(*insn_data[icode].operand[argc + have_retval].predicate)
+                    (op[argc], mode[argc]))
+		error ("argument must be a constant");
+              break;
+
+            case NEON_ARG_STOP:
+              gcc_unreachable ();
+            }
+
+          argc++;
+        }
+    }
+
+  va_end (ap);
+
+  if (have_retval)
+    switch (argc)
+      {
+      case 1:
+	pat = GEN_FCN (icode) (target, op[0]);
+	break;
+
+      case 2:
+	pat = GEN_FCN (icode) (target, op[0], op[1]);
+	break;
+
+      case 3:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2]);
+	break;
+
+      case 4:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3]);
+	break;
+
+      case 5:
+	pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4]);
+	break;
+
+      default:
+	gcc_unreachable ();
+      }
+  else
+    switch (argc)
+      {
+      case 1:
+	pat = GEN_FCN (icode) (op[0]);
+	break;
+
+      case 2:
+	pat = GEN_FCN (icode) (op[0], op[1]);
+	break;
+
+      case 3:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+	break;
+
+      case 4:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+	break;
+
+      case 5:
+	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+        break;
+
+      default:
+	gcc_unreachable ();
+      }
+
+  if (!pat)
+    return 0;
+
+  emit_insn (pat);
+
+  return target;
+}
+
+/* Expand a Neon builtin. These are "special" because they don't have symbolic
+   constants defined per-instruction or per instruction-variant. Instead, the
+   required info is looked up in the table neon_builtin_data.  */
+static rtx
+arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+{
+  neon_itype itype;
+  enum insn_code icode = locate_neon_builtin_icode (fcode, &itype);
+
+  switch (itype)
+    {
+    case NEON_UNOP:
+    case NEON_CONVERT:
+    case NEON_DUPLANE:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_BINOP:
+    case NEON_SETLANE:
+    case NEON_SCALARMUL:
+    case NEON_SCALARMULL:
+    case NEON_SCALARMULH:
+    case NEON_SHIFTINSERT:
+    case NEON_LOGICBINOP:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_STOP);
+
+    case NEON_TERNOP:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_GETLANE:
+    case NEON_FIXCONV:
+    case NEON_SHIFTIMM:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT,
+        NEON_ARG_STOP);
+
+    case NEON_CREATE:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_DUP:
+    case NEON_SPLIT:
+    case NEON_REINTERP:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_COMBINE:
+    case NEON_VTBL:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_RESULTPAIR:
+      return arm_expand_neon_args (target, icode, 0, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_STOP);
+
+    case NEON_LANEMUL:
+    case NEON_LANEMULL:
+    case NEON_LANEMULH:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_LANEMAC:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_SHIFTACC:
+      return arm_expand_neon_args (target, icode, 1, exp,
+        NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_SCALARMAC:
+      return arm_expand_neon_args (target, icode, 1, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_CONSTANT, NEON_ARG_STOP);
+
+    case NEON_SELECT:
+    case NEON_VTBX:
+      return arm_expand_neon_args (target, icode, 1, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG,
+        NEON_ARG_STOP);
+
+    case NEON_LOAD1:
+    case NEON_LOADSTRUCT:
+      return arm_expand_neon_args (target, icode, 1, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_LOAD1LANE:
+    case NEON_LOADSTRUCTLANE:
+      return arm_expand_neon_args (target, icode, 1, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+	NEON_ARG_STOP);
+
+    case NEON_STORE1:
+    case NEON_STORESTRUCT:
+      return arm_expand_neon_args (target, icode, 0, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP);
+
+    case NEON_STORE1LANE:
+    case NEON_STORESTRUCTLANE:
+      return arm_expand_neon_args (target, icode, 0, exp,
+	NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT,
+	NEON_ARG_STOP);
+    }
+
+  gcc_unreachable ();
+}
+
+/* Emit code to reinterpret one Neon type as another, without altering bits.  */
+void
+neon_reinterpret (rtx dest, rtx src)
+{
+  emit_move_insn (dest, gen_lowpart (GET_MODE (dest), src));
+}
+
+/* Emit code to place a Neon pair result in memory locations (with equal
+   registers).  */
+void
+neon_emit_pair_result_insn (enum machine_mode mode,
+			    rtx (*intfn) (rtx, rtx, rtx, rtx), rtx destaddr,
+                            rtx op1, rtx op2)
+{
+  rtx mem = gen_rtx_MEM (mode, destaddr);
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+
+  emit_insn (intfn (tmp1, op1, tmp2, op2));
+
+  emit_move_insn (mem, tmp1);
+  mem = adjust_address (mem, mode, GET_MODE_SIZE (mode));
+  emit_move_insn (mem, tmp2);
+}
+
+/* Set up operands for a register copy from src to dest, taking care not to
+   clobber registers in the process.
+   FIXME: This has rather high polynomial complexity (O(n^3)?) but shouldn't
+   be called with a large N, so that should be OK.  */
+
+void
+neon_disambiguate_copy (rtx *operands, rtx *dest, rtx *src, unsigned int count)
+{
+  unsigned int copied = 0, opctr = 0;
+  unsigned int done = (1 << count) - 1;
+  unsigned int i, j;
+
+  while (copied != done)
+    {
+      for (i = 0; i < count; i++)
+        {
+          int good = 1;
+
+          for (j = 0; good && j < count; j++)
+            if (i != j && (copied & (1 << j)) == 0
+                && reg_overlap_mentioned_p (src[j], dest[i]))
+              good = 0;
+
+          if (good)
+            {
+              operands[opctr++] = dest[i];
+              operands[opctr++] = src[i];
+              copied |= 1 << i;
+            }
+        }
+    }
+
+  gcc_assert (opctr == count * 2);
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -13789,6 +15728,9 @@ arm_expand_builtin (tree exp,
   enum machine_mode mode1;
   enum machine_mode mode2;
 
+  if (fcode >= ARM_BUILTIN_NEON_BASE)
+    return arm_expand_neon_builtin (fcode, exp, target);
+
   switch (fcode)
     {
     case ARM_BUILTIN_TEXTRMSB:
@@ -15549,6 +17491,10 @@ arm_file_start (void)
 	      fpu_name = "vfp3";
 	      set_float_abi_attributes = 1;
 	      break;
+	    case FPUTYPE_NEON:
+	      fpu_name = "neon";
+	      set_float_abi_attributes = 1;
+	      break;
 	    default:
 	      abort();
 	    }
@@ -16182,7 +18128,6 @@ arm_no_early_mul_dep (rtx producer, rtx consumer)
 	  && !reg_overlap_mentioned_p (value, XEXP (op, 0)));
 }
 
-
 /* We can't rely on the caller doing the proper promotion when
    using APCS or ATPCS.  */
 
@@ -16404,6 +18349,11 @@ thumb_set_return_address (rtx source, rtx scratch)
 bool
 arm_vector_mode_supported_p (enum machine_mode mode)
 {
+  /* Neon also supports V2SImode, etc. listed in the clause below.  */
+  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+      || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+    return true;
+
   if ((mode == V2SImode)
       || (mode == V4HImode)
       || (mode == V8QImode))
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index b9c6e85..6c4d95e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -65,6 +65,9 @@ extern char arm_arch_name[];
 	if (TARGET_VFP)					\
 	  builtin_define ("__VFP_FP__");		\
 							\
+	if (TARGET_NEON)				\
+	  builtin_define ("__ARM_NEON__");		\
+							\
 	/* Add a define for interworking.		\
 	   Needed when building libgcc.a.  */		\
 	if (arm_cpp_interwork)				\
@@ -206,10 +209,23 @@ extern GTY(()) rtx aof_pic_label;
 /* 32-bit Thumb-2 code.  */
 #define TARGET_THUMB2			(TARGET_THUMB && arm_arch_thumb2)
 
+/* The following two macros concern the ability to execute coprocessor
+   instructions for VFPv3 or NEON.  TARGET_VFP3 is currently only ever
+   tested when we know we are generating for VFP hardware; we need to
+   be more careful with TARGET_NEON as noted below.  */
+
 /* FPU is VFPv3 (with twice the number of D registers).  Setting the FPU to
    Neon automatically enables VFPv3 too.  */
 #define TARGET_VFP3 (arm_fp_model == ARM_FP_MODEL_VFP \
-		     && (arm_fpu_arch == FPUTYPE_VFP3))
+		     && (arm_fpu_arch == FPUTYPE_VFP3 \
+			 || arm_fpu_arch == FPUTYPE_NEON))
+/* FPU supports Neon instructions.  The setting of this macro gets
+   revealed via __ARM_NEON__ so we add extra guards upon TARGET_32BIT
+   and TARGET_HARD_FLOAT to ensure that NEON instructions are
+   available.  */
+#define TARGET_NEON (TARGET_32BIT && TARGET_HARD_FLOAT \
+		     && arm_fp_model == ARM_FP_MODEL_VFP \
+		     && arm_fpu_arch == FPUTYPE_NEON)
 
 /* "DSP" multiply instructions, eg. SMULxy.  */
 #define TARGET_DSP_MULTIPLY \
@@ -282,7 +298,9 @@ enum fputype
   /* VFP.  */
   FPUTYPE_VFP,
   /* VFPv3.  */
-  FPUTYPE_VFP3
+  FPUTYPE_VFP3,
+  /* Neon.  */
+  FPUTYPE_NEON
 };
 
 /* Recast the floating point class to be the floating point attribute.  */
@@ -483,6 +501,12 @@ extern int arm_arch_hwdiv;
 
 #define UNITS_PER_WORD	4
 
+/* Use the option -mvectorize-with-neon-quad to override the use of doubleword
+   registers when autovectorizing for Neon, at least until multiple vector
+   widths are supported properly by the middle-end.  */
+#define UNITS_PER_SIMD_WORD \
+  (TARGET_NEON ? (TARGET_NEON_VECTORIZE_QUAD ? 16 : 8) : UNITS_PER_WORD)
+
 /* True if natural alignment is used for doubleword types.  */
 #define ARM_DOUBLEWORD_ALIGN	TARGET_AAPCS_BASED
 
@@ -941,6 +965,18 @@ extern int arm_structure_size_boundary;
 #define VFP_REGNO_OK_FOR_DOUBLE(REGNUM) \
   ((((REGNUM) - FIRST_VFP_REGNUM) & 1) == 0)
 
+/* Neon Quad values must start at a multiple of four registers.  */
+#define NEON_REGNO_OK_FOR_QUAD(REGNUM) \
+  ((((REGNUM) - FIRST_VFP_REGNUM) & 3) == 0)
+
+/* Neon structures of vectors must be in even register pairs and there
+   must be enough registers available.  Because of various patterns
+   requiring quad registers, we require them to start at a multiple of
+   four.  */
+#define NEON_REGNO_OK_FOR_NREGS(REGNUM, N) \
+  ((((REGNUM) - FIRST_VFP_REGNUM) & 3) == 0 \
+   && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
+
 /* The number of hard registers is 16 ARM + 8 FPA + 1 CC + 1 SFP + 1 AFP.  */
 /* + 16 Cirrus registers take us up to 43.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
@@ -994,6 +1030,21 @@ extern int arm_structure_size_boundary;
 #define VALID_IWMMXT_REG_MODE(MODE) \
  (arm_vector_mode_supported_p (MODE) || (MODE) == DImode)
 
+/* Modes valid for Neon D registers.  */
+#define VALID_NEON_DREG_MODE(MODE) \
+  ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
+   || (MODE) == V2SFmode || (MODE) == DImode)
+
+/* Modes valid for Neon Q registers.  */
+#define VALID_NEON_QREG_MODE(MODE) \
+  ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
+   || (MODE) == V4SFmode || (MODE) == V2DImode)
+
+/* Structure modes valid for Neon registers.  */
+#define VALID_NEON_STRUCT_MODE(MODE) \
+  ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
+   || (MODE) == CImode || (MODE) == XImode)
+
 /* The order in which register should be allocated.  It is good to use ip
    since no saving is required (though calls clobber it) and it never contains
    function parameters.  It is quite good to use lr since other calls may
@@ -2409,7 +2460,7 @@ extern int making_const_table;
 
 #define PRINT_OPERAND_PUNCT_VALID_P(CODE)	\
   (CODE == '@' || CODE == '|' || CODE == '.'	\
-   || CODE == '(' || CODE == ')'		\
+   || CODE == '(' || CODE == ')' || CODE == '#'	\
    || (TARGET_32BIT && (CODE == '?'))		\
    || (TARGET_THUMB2 && (CODE == '!'))		\
    || (TARGET_THUMB && (CODE == '_')))
@@ -2581,6 +2632,9 @@ extern int making_const_table;
    : arm_gen_return_addr_mask ())
 
 
+/* Neon defines builtins from ARM_BUILTIN_MAX upwards, though they don't have
+   symbolic names defined here (which would require too much duplication).
+   FIXME?  */
 enum arm_builtins
 {
   ARM_BUILTIN_GETWCX,
@@ -2745,7 +2799,9 @@ enum arm_builtins
 
   ARM_BUILTIN_THREAD_POINTER,
 
-  ARM_BUILTIN_MAX
+  ARM_BUILTIN_NEON_BASE,
+
+  ARM_BUILTIN_MAX = ARM_BUILTIN_NEON_BASE  /* FIXME: Wrong!  */
 };
 
 /* Do not emit .note.GNU-stack by default.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ab04176..ddc8bed28 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -51,6 +51,7 @@
 
 ;; UNSPEC Usage:
 ;; Note: sin and cos are no-longer used.
+;; Unspec constants for Neon are defined in neon.md.
 
 (define_constants
   [(UNSPEC_SIN       0)	; `sin' operation (MODE_FLOAT):
@@ -121,12 +122,14 @@
 			;   a 32-bit object.
    (VUNSPEC_POOL_8   7) ; `pool-entry(8)'.  An entry in the constant pool for
 			;   a 64-bit object.
-   (VUNSPEC_TMRC     8) ; Used by the iWMMXt TMRC instruction.
-   (VUNSPEC_TMCR     9) ; Used by the iWMMXt TMCR instruction.
-   (VUNSPEC_ALIGN8   10) ; 8-byte alignment version of VUNSPEC_ALIGN
-   (VUNSPEC_WCMP_EQ  11) ; Used by the iWMMXt WCMPEQ instructions
-   (VUNSPEC_WCMP_GTU 12) ; Used by the iWMMXt WCMPGTU instructions
-   (VUNSPEC_WCMP_GT  13) ; Used by the iwMMXT WCMPGT instructions
+   (VUNSPEC_POOL_16  8) ; `pool-entry(16)'.  An entry in the constant pool for
+			;   a 128-bit object.
+   (VUNSPEC_TMRC     9) ; Used by the iWMMXt TMRC instruction.
+   (VUNSPEC_TMCR     10) ; Used by the iWMMXt TMCR instruction.
+   (VUNSPEC_ALIGN8   11) ; 8-byte alignment version of VUNSPEC_ALIGN
+   (VUNSPEC_WCMP_EQ  12) ; Used by the iWMMXt WCMPEQ instructions
+   (VUNSPEC_WCMP_GTU 13) ; Used by the iWMMXt WCMPGTU instructions
+   (VUNSPEC_WCMP_GT  14) ; Used by the iwMMXT WCMPGT instructions
    (VUNSPEC_EH_RETURN 20); Use to override the return address for exception
 			 ; handling.
   ]
@@ -5768,27 +5771,6 @@
   "
 )
 
-;; Vector Moves
-(define_expand "movv2si"
-  [(set (match_operand:V2SI 0 "nonimmediate_operand" "")
-	(match_operand:V2SI 1 "general_operand" ""))]
-  "TARGET_REALLY_IWMMXT"
-{
-})
-
-(define_expand "movv4hi"
-  [(set (match_operand:V4HI 0 "nonimmediate_operand" "")
-	(match_operand:V4HI 1 "general_operand" ""))]
-  "TARGET_REALLY_IWMMXT"
-{
-})
-
-(define_expand "movv8qi"
-  [(set (match_operand:V8QI 0 "nonimmediate_operand" "")
-	(match_operand:V8QI 1 "general_operand" ""))]
-  "TARGET_REALLY_IWMMXT"
-{
-})
 
 
 ;; load- and store-multiple insns
@@ -10731,6 +10713,30 @@
   [(set_attr "length" "8")]
 )
 
+(define_insn "consttable_16"
+  [(unspec_volatile [(match_operand 0 "" "")] VUNSPEC_POOL_16)]
+  "TARGET_EITHER"
+  "*
+  {
+    making_const_table = TRUE;
+    switch (GET_MODE_CLASS (GET_MODE (operands[0])))
+      {
+       case MODE_FLOAT:
+        {
+          REAL_VALUE_TYPE r;
+          REAL_VALUE_FROM_CONST_DOUBLE (r, operands[0]);
+          assemble_real (r, GET_MODE (operands[0]), BITS_PER_WORD);
+          break;
+        }
+      default:
+        assemble_integer (operands[0], 16, BITS_PER_WORD, 1);
+        break;
+      }
+    return \"\";
+  }"
+  [(set_attr "length" "16")]
+)
+
 ;; Miscellaneous Thumb patterns
 
 (define_expand "tablejump"
@@ -10906,10 +10912,14 @@
 (include "fpa.md")
 ;; Load the Maverick co-processor patterns
 (include "cirrus.md")
+;; Vector bits common to IWMMXT and Neon
+(include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
 (include "iwmmxt.md")
 ;; Load the VFP co-processor patterns
 (include "vfp.md")
 ;; Thumb-2 patterns
 (include "thumb2.md")
+;; Neon patterns
+(include "neon.md")
 
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 8f85ffb..6371742 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -153,3 +153,7 @@ Tune code for the given processor
 mwords-little-endian
 Target Report RejectNegative Mask(LITTLE_WORDS)
 Assume big endian bytes, little endian words
+
+mvectorize-with-neon-quad
+Target Report Mask(NEON_VECTORIZE_QUAD)
+Use Neon quad-word (rather than double-word) registers for vectorization
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
new file mode 100644
index 0000000..e013e17
--- /dev/null
+++ b/gcc/config/arm/arm_neon.h
@@ -0,0 +1,12179 @@
+/* ARM NEON intrinsics include file. This file is generated automatically
+   using neon-gen.ml.  Please do not edit manually.
+
+   Copyright (C) 2006, 2007 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 2, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the
+   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+   MA 02110-1301, USA.  */
+
+/* As a special exception, if you include this header file into source
+   files compiled by GCC, this header file does not by itself cause
+   the resulting executable to be covered by the GNU General Public
+   License.  This exception does not however invalidate any other
+   reasons why the executable file might be covered by the GNU General
+   Public License.  */
+
+#ifndef _GCC_ARM_NEON_H
+#define _GCC_ARM_NEON_H 1
+
+#ifndef __ARM_NEON__
+#error You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use arm_neon.h
+#else
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+typedef __builtin_neon_qi int8x8_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_hi int16x4_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_si int32x2_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_di int64x1_t;
+typedef __builtin_neon_sf float32x2_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_poly8 poly8x8_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_poly16 poly16x4_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_uqi uint8x8_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_uhi uint16x4_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_usi uint32x2_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_udi uint64x1_t;
+typedef __builtin_neon_qi int8x16_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_hi int16x8_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_si int32x4_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_di int64x2_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_sf float32x4_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_poly8 poly8x16_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_poly16 poly16x8_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_uqi uint8x16_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_uhi uint16x8_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_usi uint32x4_t	__attribute__ ((__vector_size__ (16)));
+typedef __builtin_neon_udi uint64x2_t	__attribute__ ((__vector_size__ (16)));
+
+typedef __builtin_neon_sf float32_t;
+typedef __builtin_neon_poly8 poly8_t;
+typedef __builtin_neon_poly16 poly16_t;
+
+typedef struct int8x8x2_t
+{
+  int8x8_t val[2];
+} int8x8x2_t;
+
+typedef struct int8x16x2_t
+{
+  int8x16_t val[2];
+} int8x16x2_t;
+
+typedef struct int16x4x2_t
+{
+  int16x4_t val[2];
+} int16x4x2_t;
+
+typedef struct int16x8x2_t
+{
+  int16x8_t val[2];
+} int16x8x2_t;
+
+typedef struct int32x2x2_t
+{
+  int32x2_t val[2];
+} int32x2x2_t;
+
+typedef struct int32x4x2_t
+{
+  int32x4_t val[2];
+} int32x4x2_t;
+
+typedef struct int64x1x2_t
+{
+  int64x1_t val[2];
+} int64x1x2_t;
+
+typedef struct int64x2x2_t
+{
+  int64x2_t val[2];
+} int64x2x2_t;
+
+typedef struct uint8x8x2_t
+{
+  uint8x8_t val[2];
+} uint8x8x2_t;
+
+typedef struct uint8x16x2_t
+{
+  uint8x16_t val[2];
+} uint8x16x2_t;
+
+typedef struct uint16x4x2_t
+{
+  uint16x4_t val[2];
+} uint16x4x2_t;
+
+typedef struct uint16x8x2_t
+{
+  uint16x8_t val[2];
+} uint16x8x2_t;
+
+typedef struct uint32x2x2_t
+{
+  uint32x2_t val[2];
+} uint32x2x2_t;
+
+typedef struct uint32x4x2_t
+{
+  uint32x4_t val[2];
+} uint32x4x2_t;
+
+typedef struct uint64x1x2_t
+{
+  uint64x1_t val[2];
+} uint64x1x2_t;
+
+typedef struct uint64x2x2_t
+{
+  uint64x2_t val[2];
+} uint64x2x2_t;
+
+typedef struct float32x2x2_t
+{
+  float32x2_t val[2];
+} float32x2x2_t;
+
+typedef struct float32x4x2_t
+{
+  float32x4_t val[2];
+} float32x4x2_t;
+
+typedef struct poly8x8x2_t
+{
+  poly8x8_t val[2];
+} poly8x8x2_t;
+
+typedef struct poly8x16x2_t
+{
+  poly8x16_t val[2];
+} poly8x16x2_t;
+
+typedef struct poly16x4x2_t
+{
+  poly16x4_t val[2];
+} poly16x4x2_t;
+
+typedef struct poly16x8x2_t
+{
+  poly16x8_t val[2];
+} poly16x8x2_t;
+
+typedef struct int8x8x3_t
+{
+  int8x8_t val[3];
+} int8x8x3_t;
+
+typedef struct int8x16x3_t
+{
+  int8x16_t val[3];
+} int8x16x3_t;
+
+typedef struct int16x4x3_t
+{
+  int16x4_t val[3];
+} int16x4x3_t;
+
+typedef struct int16x8x3_t
+{
+  int16x8_t val[3];
+} int16x8x3_t;
+
+typedef struct int32x2x3_t
+{
+  int32x2_t val[3];
+} int32x2x3_t;
+
+typedef struct int32x4x3_t
+{
+  int32x4_t val[3];
+} int32x4x3_t;
+
+typedef struct int64x1x3_t
+{
+  int64x1_t val[3];
+} int64x1x3_t;
+
+typedef struct int64x2x3_t
+{
+  int64x2_t val[3];
+} int64x2x3_t;
+
+typedef struct uint8x8x3_t
+{
+  uint8x8_t val[3];
+} uint8x8x3_t;
+
+typedef struct uint8x16x3_t
+{
+  uint8x16_t val[3];
+} uint8x16x3_t;
+
+typedef struct uint16x4x3_t
+{
+  uint16x4_t val[3];
+} uint16x4x3_t;
+
+typedef struct uint16x8x3_t
+{
+  uint16x8_t val[3];
+} uint16x8x3_t;
+
+typedef struct uint32x2x3_t
+{
+  uint32x2_t val[3];
+} uint32x2x3_t;
+
+typedef struct uint32x4x3_t
+{
+  uint32x4_t val[3];
+} uint32x4x3_t;
+
+typedef struct uint64x1x3_t
+{
+  uint64x1_t val[3];
+} uint64x1x3_t;
+
+typedef struct uint64x2x3_t
+{
+  uint64x2_t val[3];
+} uint64x2x3_t;
+
+typedef struct float32x2x3_t
+{
+  float32x2_t val[3];
+} float32x2x3_t;
+
+typedef struct float32x4x3_t
+{
+  float32x4_t val[3];
+} float32x4x3_t;
+
+typedef struct poly8x8x3_t
+{
+  poly8x8_t val[3];
+} poly8x8x3_t;
+
+typedef struct poly8x16x3_t
+{
+  poly8x16_t val[3];
+} poly8x16x3_t;
+
+typedef struct poly16x4x3_t
+{
+  poly16x4_t val[3];
+} poly16x4x3_t;
+
+typedef struct poly16x8x3_t
+{
+  poly16x8_t val[3];
+} poly16x8x3_t;
+
+typedef struct int8x8x4_t
+{
+  int8x8_t val[4];
+} int8x8x4_t;
+
+typedef struct int8x16x4_t
+{
+  int8x16_t val[4];
+} int8x16x4_t;
+
+typedef struct int16x4x4_t
+{
+  int16x4_t val[4];
+} int16x4x4_t;
+
+typedef struct int16x8x4_t
+{
+  int16x8_t val[4];
+} int16x8x4_t;
+
+typedef struct int32x2x4_t
+{
+  int32x2_t val[4];
+} int32x2x4_t;
+
+typedef struct int32x4x4_t
+{
+  int32x4_t val[4];
+} int32x4x4_t;
+
+typedef struct int64x1x4_t
+{
+  int64x1_t val[4];
+} int64x1x4_t;
+
+typedef struct int64x2x4_t
+{
+  int64x2_t val[4];
+} int64x2x4_t;
+
+typedef struct uint8x8x4_t
+{
+  uint8x8_t val[4];
+} uint8x8x4_t;
+
+typedef struct uint8x16x4_t
+{
+  uint8x16_t val[4];
+} uint8x16x4_t;
+
+typedef struct uint16x4x4_t
+{
+  uint16x4_t val[4];
+} uint16x4x4_t;
+
+typedef struct uint16x8x4_t
+{
+  uint16x8_t val[4];
+} uint16x8x4_t;
+
+typedef struct uint32x2x4_t
+{
+  uint32x2_t val[4];
+} uint32x2x4_t;
+
+typedef struct uint32x4x4_t
+{
+  uint32x4_t val[4];
+} uint32x4x4_t;
+
+typedef struct uint64x1x4_t
+{
+  uint64x1_t val[4];
+} uint64x1x4_t;
+
+typedef struct uint64x2x4_t
+{
+  uint64x2_t val[4];
+} uint64x2x4_t;
+
+typedef struct float32x2x4_t
+{
+  float32x2_t val[4];
+} float32x2x4_t;
+
+typedef struct float32x4x4_t
+{
+  float32x4_t val[4];
+} float32x4x4_t;
+
+typedef struct poly8x8x4_t
+{
+  poly8x8_t val[4];
+} poly8x8x4_t;
+
+typedef struct poly8x16x4_t
+{
+  poly8x16_t val[4];
+} poly8x16x4_t;
+
+typedef struct poly16x4x4_t
+{
+  poly16x4_t val[4];
+} poly16x4x4_t;
+
+typedef struct poly16x8x4_t
+{
+  poly16x8_t val[4];
+} poly16x8x4_t;
+
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vadd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vaddv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vadd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vaddv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vadd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vaddv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vadd_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vadddi (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vadd_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vaddv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vadd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vadd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vadd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vadd_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vadddi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vaddv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vaddv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vaddv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vaddq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vaddv2di (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vaddq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vaddv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vaddq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vaddv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vaddl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vaddlv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vaddl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vaddlv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vaddl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vaddlv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vaddl_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vaddlv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vaddl_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vaddlv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vaddl_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vaddlv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vaddw_s8 (int16x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vaddwv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vaddw_s16 (int32x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vaddwv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vaddw_s32 (int64x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vaddwv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vaddw_u8 (uint16x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vaddwv8qi ((int16x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vaddw_u16 (uint32x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vaddwv4hi ((int32x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vaddw_u32 (uint64x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vaddwv2si ((int64x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vhadd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vhaddv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vhadd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vhaddv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vhadd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vhaddv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vhadd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vhaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vhadd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vhaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vhadd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vhaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vhaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vhaddv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vhaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vhaddv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vhaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vhaddv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vhaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vhaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vhaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrhadd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vhaddv8qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrhadd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vhaddv4hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrhadd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vhaddv2si (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrhadd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vhaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrhadd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vhaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrhadd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vhaddv2si ((int32x2_t) __a, (int32x2_t) __b, 4);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrhaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vhaddv16qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrhaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vhaddv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vrhaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vhaddv4si (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vhaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 4);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vhaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vhaddv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqadd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vqaddv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqadd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqaddv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqadd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqaddv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vqadd_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vqadddi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqadd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vqaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqadd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vqaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqadd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vqaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqadd_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vqadddi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqaddq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vqaddv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqaddq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqaddv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqaddq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqaddv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqaddq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqaddv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vqaddv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vqaddv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vqaddv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqaddq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vqaddv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vaddhn_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vaddhnv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vaddhn_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vaddhnv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vaddhn_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vaddhnv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vaddhn_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vaddhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vaddhn_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vaddhnv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vaddhn_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vaddhnv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vraddhn_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vaddhnv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vraddhn_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vaddhnv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vraddhn_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vaddhnv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vraddhn_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vaddhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vraddhn_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vaddhnv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vraddhn_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vaddhnv2di ((int64x2_t) __a, (int64x2_t) __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmul_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vmulv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmul_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vmulv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmul_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vmulv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmul_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vmulv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmul_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmul_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vmulv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmul_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vmulv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vmul_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  return (poly8x8_t)__builtin_neon_vmulv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmulq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vmulv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmulq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vmulv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmulq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vmulv4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmulq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vmulv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmulq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vmulv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmulq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vmulv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmulq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vmulv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vmulq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  return (poly8x16_t)__builtin_neon_vmulv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqdmulh_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqdmulhv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqdmulh_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqdmulhv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqdmulhq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqdmulhv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmulhq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmulhv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmulh_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqdmulhv4hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmulh_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqdmulhv2si (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmulhq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqdmulhv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmulhv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmull_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vmullv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmull_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vmullv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmull_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vmullv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmull_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vmullv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmull_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vmullv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmull_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vmullv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vmull_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  return (poly16x8_t)__builtin_neon_vmullv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmull_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmullv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmull_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqdmullv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmla_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int8x8_t)__builtin_neon_vmlav8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmla_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vmlav4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmla_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vmlav2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmla_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+{
+  return (float32x2_t)__builtin_neon_vmlav2sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint8x8_t)__builtin_neon_vmlav8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vmlav4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vmlav2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmlaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
+{
+  return (int8x16_t)__builtin_neon_vmlav16qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmlav8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlav4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+{
+  return (float32x4_t)__builtin_neon_vmlav4sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmlaq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
+{
+  return (uint8x16_t)__builtin_neon_vmlav16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlaq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmlav8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlaq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlav4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlal_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmlalv8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlalv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vmlalv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlal_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmlalv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlal_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlalv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlal_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vmlalv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmlalv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vqdmlalv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmls_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int8x8_t)__builtin_neon_vmlsv8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmls_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vmlsv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmls_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vmlsv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmls_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+{
+  return (float32x2_t)__builtin_neon_vmlsv2sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmls_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint8x8_t)__builtin_neon_vmlsv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmls_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vmlsv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmls_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vmlsv2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmlsq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
+{
+  return (int8x16_t)__builtin_neon_vmlsv16qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlsq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmlsv8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlsv4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+{
+  return (float32x4_t)__builtin_neon_vmlsv4sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmlsq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
+{
+  return (uint8x16_t)__builtin_neon_vmlsv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlsq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmlsv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlsv4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlsl_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmlslv8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlslv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vmlslv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlsl_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmlslv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsl_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlslv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlsl_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vmlslv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmlslv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vqdmlslv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vsub_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vsubv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vsub_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vsubv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vsub_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vsubv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vsub_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vsubdi (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vsub_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vsubv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vsub_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vsub_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vsub_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vsub_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vsubdi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vsubq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vsubv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsubq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vsubv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsubq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vsubv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsubq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vsubv2di (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vsubq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vsubv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsubq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vsubv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsubl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vsublv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsubl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vsublv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsubl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vsublv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsubl_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vsublv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsubl_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vsublv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsubl_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vsublv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsubw_s8 (int16x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vsubwv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsubw_s16 (int32x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vsubwv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsubw_s32 (int64x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vsubwv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsubw_u8 (uint16x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vsubwv8qi ((int16x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsubw_u16 (uint32x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vsubwv4hi ((int32x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsubw_u32 (uint64x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vsubwv2si ((int64x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vhsub_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vhsubv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vhsub_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vhsubv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vhsub_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vhsubv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vhsub_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vhsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vhsub_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vhsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vhsub_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vhsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vhsubq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vhsubv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vhsubq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vhsubv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vhsubq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vhsubv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vhsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vhsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vhsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vhsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vhsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vhsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqsub_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vqsubv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqsub_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqsubv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqsub_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqsubv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vqsub_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vqsubdi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqsub_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vqsubv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqsub_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vqsubv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqsub_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vqsubv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqsub_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vqsubdi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqsubq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vqsubv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqsubq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqsubv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqsubq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqsubv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqsubq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqsubv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vqsubv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vqsubv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vqsubv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqsubq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vqsubv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vsubhn_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vsubhnv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vsubhn_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vsubhnv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vsubhn_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vsubhnv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vsubhnv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vsubhnv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrsubhn_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vsubhnv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrsubhn_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vsubhnv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrsubhn_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vsubhnv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vsubhnv8hi ((int16x8_t) __a, (int16x8_t) __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vsubhnv4si ((int32x4_t) __a, (int32x4_t) __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vsubhnv2di ((int64x2_t) __a, (int64x2_t) __b, 4);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vceq_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vceqv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vceqv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vceq_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vceqv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vceq_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vceqv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vceq_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vceq_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vceqv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vceq_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vceqv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vceq_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vceqv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vceqq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vceqv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vceqv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vceqq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vceqv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vceqq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vceqv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vceqq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vceqq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vceqv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vceqq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vceqv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vceqq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vceqv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcge_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgev8qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcge_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcge_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcge_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgev8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcge_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcge_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcgeq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgev16qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgeq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgeq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgev16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcle_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgev8qi (__b, __a, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hi (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcle_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2si (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcle_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcle_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgev8qi ((int8x8_t) __b, (int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcle_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgev4hi ((int16x4_t) __b, (int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcle_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgev2si ((int32x2_t) __b, (int32x2_t) __a, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcleq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgev16qi (__b, __a, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hi (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcleq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4si (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcleq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcleq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgev16qi ((int8x16_t) __b, (int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcleq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgev8hi ((int16x8_t) __b, (int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcleq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgev4si ((int32x4_t) __b, (int32x4_t) __a, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcgt_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgtv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcgt_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcgt_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcgt_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgtv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcgt_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcgt_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcgtq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgtv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgtq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgtq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcgtq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgtv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcgtq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcgtq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vclt_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgtv8qi (__b, __a, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hi (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vclt_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2si (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vclt_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vclt_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vcgtv8qi ((int8x8_t) __b, (int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclt_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vcgtv4hi ((int16x4_t) __b, (int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vclt_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcgtv2si ((int32x2_t) __b, (int32x2_t) __a, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcltq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgtv16qi (__b, __a, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hi (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcltq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4si (__b, __a, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcltq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcltq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcgtv16qi ((int8x16_t) __b, (int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcltq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcgtv8hi ((int16x8_t) __b, (int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcgtv4si ((int32x4_t) __b, (int32x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcage_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcagev2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcageq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcagev4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcale_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcagev2sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcaleq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcagev4sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcagt_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcagtv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcagtq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcagtv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcalt_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vcagtv2sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcaltq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcagtv4sf (__b, __a, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtst_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vtstv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vtst_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vtstv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vtst_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vtstv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtst_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vtst_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vtstv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vtst_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vtstv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtst_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b, 2);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vtstq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vtstv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vtstq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vtstv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vtstq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vtstv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vtstq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vtstq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vtstv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vtstq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vtstv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vtstq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b, 2);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vabd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vabdv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vabd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vabdv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vabd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vabdv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vabd_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vabdv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vabd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vabdv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vabd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vabdv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vabd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vabdv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vabdq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vabdv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vabdq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vabdv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vabdq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vabdv4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vabdq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vabdv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vabdq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vabdv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vabdq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vabdv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vabdq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vabdv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vabdl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vabdlv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vabdl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vabdlv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vabdl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vabdlv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vabdl_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vabdlv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vabdl_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vabdlv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vabdl_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vabdlv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vaba_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int8x8_t)__builtin_neon_vabav8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vaba_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vabav4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vaba_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vabav2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vaba_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint8x8_t)__builtin_neon_vabav8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vaba_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vabav4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vaba_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vabav2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vabaq_s8 (int8x16_t __a, int8x16_t __b, int8x16_t __c)
+{
+  return (int8x16_t)__builtin_neon_vabav16qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vabaq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vabav8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vabaq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vabav4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vabaq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
+{
+  return (uint8x16_t)__builtin_neon_vabav16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vabaq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vabav8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vabaq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vabav4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vabal_s8 (int16x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vabalv8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vabal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vabalv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vabal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vabalv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vabal_u8 (uint16x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vabalv8qi ((int16x8_t) __a, (int8x8_t) __b, (int8x8_t) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vabal_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vabalv4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vabal_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vabalv2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmax_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vmaxv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmax_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vmaxv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmax_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vmaxv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmax_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vmaxv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmax_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vmaxv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmax_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vmaxv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmax_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vmaxv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmaxq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vmaxv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmaxq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vmaxv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmaxq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vmaxv4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmaxq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vmaxv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vmaxv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmaxq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vmaxv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vmaxv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmin_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vminv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmin_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vminv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmin_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vminv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmin_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vminv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmin_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vminv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmin_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vminv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmin_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vminv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vminq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vminv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vminq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vminv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vminq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vminv4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vminq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vminv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vminq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vminv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vminq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vminv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vminq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vminv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vpadd_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vpaddv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpadd_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vpaddv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpadd_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vpaddv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpadd_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vpaddv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vpadd_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vpaddv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpadd_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vpaddv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vpaddv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpaddl_s8 (int8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vpaddlv8qi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpaddl_s16 (int16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vpaddlv4hi (__a, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vpaddl_s32 (int32x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vpaddlv2si (__a, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpaddl_u8 (uint8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vpaddlv8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpaddl_u16 (uint16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vpaddlv4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vpaddl_u32 (uint32x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vpaddlv2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vpaddlq_s8 (int8x16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vpaddlv16qi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vpaddlq_s16 (int16x8_t __a)
+{
+  return (int32x4_t)__builtin_neon_vpaddlv8hi (__a, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vpaddlq_s32 (int32x4_t __a)
+{
+  return (int64x2_t)__builtin_neon_vpaddlv4si (__a, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vpaddlq_u8 (uint8x16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vpaddlv16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vpaddlq_u16 (uint16x8_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vpaddlv8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vpaddlq_u32 (uint32x4_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vpaddlv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpadal_s8 (int16x4_t __a, int8x8_t __b)
+{
+  return (int16x4_t)__builtin_neon_vpadalv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpadal_s16 (int32x2_t __a, int16x4_t __b)
+{
+  return (int32x2_t)__builtin_neon_vpadalv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vpadal_s32 (int64x1_t __a, int32x2_t __b)
+{
+  return (int64x1_t)__builtin_neon_vpadalv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpadal_u8 (uint16x4_t __a, uint8x8_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vpadalv8qi ((int16x4_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpadal_u16 (uint32x2_t __a, uint16x4_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vpadalv4hi ((int32x2_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vpadal_u32 (uint64x1_t __a, uint32x2_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vpadalv2si ((int64x1_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vpadalq_s8 (int16x8_t __a, int8x16_t __b)
+{
+  return (int16x8_t)__builtin_neon_vpadalv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vpadalq_s16 (int32x4_t __a, int16x8_t __b)
+{
+  return (int32x4_t)__builtin_neon_vpadalv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vpadalq_s32 (int64x2_t __a, int32x4_t __b)
+{
+  return (int64x2_t)__builtin_neon_vpadalv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vpadalq_u8 (uint16x8_t __a, uint8x16_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vpadalv16qi ((int16x8_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vpadalq_u16 (uint32x4_t __a, uint16x8_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vpadalv8hi ((int32x4_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vpadalq_u32 (uint64x2_t __a, uint32x4_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vpadalv4si ((int64x2_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vpmax_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vpmaxv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpmax_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vpmaxv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpmax_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vpmaxv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpmax_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vpmaxv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vpmax_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vpmaxv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpmax_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vpmaxv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpmax_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vpmaxv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vpmin_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vpminv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vpmin_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vpminv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vpmin_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vpminv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpmin_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vpminv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vpmin_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vpminv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vpmin_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vpminv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vpmin_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vpminv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vrecps_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vrecpsv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vrecpsq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vrecpsv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vrsqrts_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x2_t)__builtin_neon_vrsqrtsv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vrsqrtsq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return (float32x4_t)__builtin_neon_vrsqrtsv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vshl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vshlv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vshl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vshlv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vshl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vshlv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vshl_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vshldi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vshl_u8 (uint8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vshlv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vshl_u16 (uint16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vshlv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vshl_u32 (uint32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vshlv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vshl_u64 (uint64x1_t __a, int64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vshldi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vshlv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vshlv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vshlv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vshlq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vshlv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vshlv16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vshlv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vshlv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vshlq_u64 (uint64x2_t __a, int64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vshlv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrshl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vshlv8qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrshl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vshlv4hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrshl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vshlv2si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vrshl_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vshldi (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrshl_u8 (uint8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vshlv8qi ((int8x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrshl_u16 (uint16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vshlv4hi ((int16x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrshl_u32 (uint32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vshlv2si ((int32x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vrshl_u64 (uint64x1_t __a, int64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vshldi ((int64x1_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vshlv16qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vshlv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vrshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vshlv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vrshlq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vshlv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vshlv16qi ((int8x16_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vshlv8hi ((int16x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vshlv4si ((int32x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vrshlq_u64 (uint64x2_t __a, int64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vshlv2di ((int64x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqshl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vqshlv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqshl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqshlv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqshl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqshlv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vqshl_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vqshldi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqshl_u8 (uint8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshlv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqshl_u16 (uint16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshlv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqshl_u32 (uint32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshlv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqshl_u64 (uint64x1_t __a, int64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vqshldi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vqshlv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqshlv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqshlv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqshlq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqshlv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vqshlv16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vqshlv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vqshlv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqshlq_u64 (uint64x2_t __a, int64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vqshlv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqrshl_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vqshlv8qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrshl_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqshlv4hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrshl_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqshlv2si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vqrshl_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vqshldi (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqrshl_u8 (uint8x8_t __a, int8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshlv8qi ((int8x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqrshl_u16 (uint16x4_t __a, int16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshlv4hi ((int16x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqrshl_u32 (uint32x2_t __a, int32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshlv2si ((int32x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqrshl_u64 (uint64x1_t __a, int64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vqshldi ((int64x1_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqrshlq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vqshlv16qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrshlq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqshlv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrshlq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqshlv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqrshlq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqshlv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqrshlq_u8 (uint8x16_t __a, int8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vqshlv16qi ((int8x16_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqrshlq_u16 (uint16x8_t __a, int16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vqshlv8hi ((int16x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqrshlq_u32 (uint32x4_t __a, int32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vqshlv4si ((int32x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqrshlq_u64 (uint64x2_t __a, int64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vqshlv2di ((int64x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vshr_n_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vshr_nv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vshr_n_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vshr_nv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vshr_n_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vshr_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vshr_n_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x1_t)__builtin_neon_vshr_ndi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vshr_n_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vshr_nv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vshr_n_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vshr_nv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vshr_n_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vshr_nv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vshr_n_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vshr_ndi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vshrq_n_s8 (int8x16_t __a, const int __b)
+{
+  return (int8x16_t)__builtin_neon_vshr_nv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vshrq_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vshr_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vshrq_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vshr_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vshrq_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vshr_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vshrq_n_u8 (uint8x16_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vshr_nv16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vshrq_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vshr_nv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vshrq_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vshr_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vshrq_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vshr_nv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrshr_n_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vshr_nv8qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrshr_n_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vshr_nv4hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrshr_n_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vshr_nv2si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vrshr_n_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x1_t)__builtin_neon_vshr_ndi (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrshr_n_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vshr_nv8qi ((int8x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrshr_n_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vshr_nv4hi ((int16x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrshr_n_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vshr_nv2si ((int32x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vrshr_n_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vshr_ndi ((int64x1_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrshrq_n_s8 (int8x16_t __a, const int __b)
+{
+  return (int8x16_t)__builtin_neon_vshr_nv16qi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrshrq_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vshr_nv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vrshrq_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vshr_nv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vrshrq_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vshr_nv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrshrq_n_u8 (uint8x16_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vshr_nv16qi ((int8x16_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrshrq_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vshr_nv8hi ((int16x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrshrq_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vshr_nv4si ((int32x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vrshrq_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vshr_nv2di ((int64x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vshrn_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vshrn_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vshrn_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vshrn_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vshrn_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vshrn_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vshrn_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vshrn_nv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vshrn_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vshrn_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vshrn_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vshrn_nv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrshrn_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vshrn_nv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrshrn_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vshrn_nv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrshrn_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vshrn_nv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrshrn_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vshrn_nv8hi ((int16x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrshrn_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vshrn_nv4si ((int32x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrshrn_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vshrn_nv2di ((int64x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqshrn_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vqshrn_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqshrn_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vqshrn_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqshrn_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vqshrn_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqshrn_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshrn_nv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqshrn_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshrn_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqshrn_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshrn_nv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqrshrn_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vqshrn_nv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrshrn_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vqshrn_nv4si (__a, __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrshrn_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vqshrn_nv2di (__a, __b, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqrshrn_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshrn_nv8hi ((int16x8_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqrshrn_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshrn_nv4si ((int32x4_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqrshrn_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshrn_nv2di ((int64x2_t) __a, __b, 4);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqshrun_n_s16 (int16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshrun_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqshrun_n_s32 (int32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshrun_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqshrun_n_s64 (int64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshrun_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqrshrun_n_s16 (int16x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshrun_nv8hi (__a, __b, 5);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqrshrun_n_s32 (int32x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshrun_nv4si (__a, __b, 5);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqrshrun_n_s64 (int64x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshrun_nv2di (__a, __b, 5);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vshl_n_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vshl_nv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vshl_n_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vshl_nv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vshl_n_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vshl_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vshl_n_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x1_t)__builtin_neon_vshl_ndi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vshl_n_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vshl_nv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vshl_n_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vshl_nv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vshl_n_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vshl_nv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vshl_n_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vshl_ndi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vshlq_n_s8 (int8x16_t __a, const int __b)
+{
+  return (int8x16_t)__builtin_neon_vshl_nv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vshlq_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vshl_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vshlq_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vshl_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vshlq_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vshl_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vshlq_n_u8 (uint8x16_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vshl_nv16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vshlq_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vshl_nv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vshlq_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vshl_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vshlq_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vshl_nv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqshl_n_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vqshl_nv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqshl_n_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vqshl_nv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqshl_n_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vqshl_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vqshl_n_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x1_t)__builtin_neon_vqshl_ndi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqshl_n_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshl_nv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqshl_n_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshl_nv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqshl_n_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshl_nv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqshl_n_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vqshl_ndi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqshlq_n_s8 (int8x16_t __a, const int __b)
+{
+  return (int8x16_t)__builtin_neon_vqshl_nv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqshlq_n_s16 (int16x8_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vqshl_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqshlq_n_s32 (int32x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vqshl_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqshlq_n_s64 (int64x2_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vqshl_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqshlq_n_u8 (uint8x16_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vqshl_nv16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqshlq_n_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vqshl_nv8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqshlq_n_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vqshl_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqshlq_n_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vqshl_nv2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqshlu_n_s8 (int8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vqshlu_nv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqshlu_n_s16 (int16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vqshlu_nv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqshlu_n_s32 (int32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vqshlu_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vqshlu_n_s64 (int64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vqshlu_ndi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqshluq_n_s8 (int8x16_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vqshlu_nv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vqshluq_n_s16 (int16x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vqshlu_nv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vqshluq_n_s32 (int32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vqshlu_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vqshluq_n_s64 (int64x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vqshlu_nv2di (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vshll_n_s8 (int8x8_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vshll_nv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vshll_n_s16 (int16x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vshll_nv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vshll_n_s32 (int32x2_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vshll_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vshll_n_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vshll_nv8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vshll_n_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vshll_nv4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vshll_n_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vshll_nv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vsra_nv8qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vsra_nv4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vsra_nv2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vsra_ndi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vsra_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vsra_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vsra_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vsra_ndi ((int64x1_t) __a, (int64x1_t) __b, __c, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vsra_nv16qi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vsra_nv8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vsra_nv4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vsra_nv2di (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vsra_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vsra_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vsra_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vsra_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vsra_nv8qi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vsra_nv4hi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vsra_nv2si (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vrsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vsra_ndi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vsra_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vsra_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vsra_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vrsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vsra_ndi ((int64x1_t) __a, (int64x1_t) __b, __c, 4);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vsra_nv16qi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vsra_nv8hi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vrsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vsra_nv4si (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vrsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vsra_nv2di (__a, __b, __c, 5);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vsra_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vsra_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vsra_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c, 4);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vrsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vsra_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c, 4);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vsri_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vsri_nv8qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vsri_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vsri_nv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vsri_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vsri_nv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vsri_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vsri_ndi (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vsri_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vsri_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vsri_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vsri_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vsri_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vsri_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vsri_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vsri_ndi ((int64x1_t) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vsri_n_p8 (poly8x8_t __a, poly8x8_t __b, const int __c)
+{
+  return (poly8x8_t)__builtin_neon_vsri_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vsri_n_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
+{
+  return (poly16x4_t)__builtin_neon_vsri_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vsriq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vsri_nv16qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsriq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vsri_nv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsriq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vsri_nv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsriq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vsri_nv2di (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vsriq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vsri_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsriq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vsri_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsriq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vsri_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsriq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vsri_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vsriq_n_p8 (poly8x16_t __a, poly8x16_t __b, const int __c)
+{
+  return (poly8x16_t)__builtin_neon_vsri_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vsriq_n_p16 (poly16x8_t __a, poly16x8_t __b, const int __c)
+{
+  return (poly16x8_t)__builtin_neon_vsri_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vsli_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vsli_nv8qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vsli_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vsli_nv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vsli_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vsli_nv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vsli_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vsli_ndi (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vsli_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vsli_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vsli_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vsli_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vsli_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vsli_nv2si ((int32x2_t) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vsli_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vsli_ndi ((int64x1_t) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vsli_n_p8 (poly8x8_t __a, poly8x8_t __b, const int __c)
+{
+  return (poly8x8_t)__builtin_neon_vsli_nv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vsli_n_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
+{
+  return (poly16x4_t)__builtin_neon_vsli_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vsliq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vsli_nv16qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsliq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vsli_nv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsliq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vsli_nv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsliq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vsli_nv2di (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vsliq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vsli_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsliq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vsli_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsliq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vsli_nv4si ((int32x4_t) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsliq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vsliq_n_p8 (poly8x16_t __a, poly8x16_t __b, const int __c)
+{
+  return (poly8x16_t)__builtin_neon_vsli_nv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vsliq_n_p16 (poly16x8_t __a, poly16x8_t __b, const int __c)
+{
+  return (poly16x8_t)__builtin_neon_vsli_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vabs_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vabsv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vabs_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vabsv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vabs_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vabsv2si (__a, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vabs_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vabsv2sf (__a, 3);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vabsq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vabsv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vabsq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vabsv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vabsq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vabsv4si (__a, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vabsq_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vabsv4sf (__a, 3);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqabs_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vqabsv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqabs_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vqabsv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqabs_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vqabsv2si (__a, 1);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqabsq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vqabsv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqabsq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vqabsv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqabsq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vqabsv4si (__a, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vneg_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vnegv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vneg_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vnegv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vneg_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vnegv2si (__a, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vneg_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vnegv2sf (__a, 3);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vnegq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vnegv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vnegq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vnegv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vnegq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vnegv4si (__a, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vnegq_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vnegv4sf (__a, 3);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqneg_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vqnegv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqneg_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vqnegv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqneg_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vqnegv2si (__a, 1);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqnegq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vqnegv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqnegq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vqnegv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqnegq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vqnegv4si (__a, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmvn_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vmvnv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmvn_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vmvnv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmvn_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vmvnv2si (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmvn_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmvn_u16 (uint16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vmvnv4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmvn_u32 (uint32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vmvnv2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vmvn_p8 (poly8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vmvnv8qi ((int8x8_t) __a, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmvnq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vmvnv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmvnq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vmvnv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmvnq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vmvnv4si (__a, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmvnq_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmvnq_u16 (uint16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vmvnv8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmvnq_u32 (uint32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vmvnv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vmvnq_p8 (poly8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vmvnv16qi ((int8x16_t) __a, 2);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vcls_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vclsv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcls_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vclsv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcls_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vclsv2si (__a, 1);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vclsq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vclsv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vclsq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vclsv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vclsq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vclsv4si (__a, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vclz_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vclzv8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vclz_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vclzv4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vclz_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vclzv2si (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vclz_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vclzv8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vclz_u16 (uint16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vclzv4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vclz_u32 (uint32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vclzv2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vclzq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vclzv16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vclzq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vclzv8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vclzq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vclzv4si (__a, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vclzq_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vclzv16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vclzq_u16 (uint16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vclzv8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vclzq_u32 (uint32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vclzv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vcnt_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vcntv8qi (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcnt_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vcnt_p8 (poly8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vcntv8qi ((int8x8_t) __a, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vcntq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vcntv16qi (__a, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcntq_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vcntq_p8 (poly8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vcntv16qi ((int8x16_t) __a, 2);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vrecpe_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vrecpev2sf (__a, 3);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrecpe_u32 (uint32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vrecpev2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vrecpeq_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vrecpev4sf (__a, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrecpeq_u32 (uint32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vrecpev4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vrsqrte_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vrsqrtev2sf (__a, 3);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrsqrte_u32 (uint32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vrsqrtev2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vrsqrteq_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vrsqrtev4sf (__a, 3);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrsqrteq_u32 (uint32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vrsqrtev4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+vget_lane_s8 (int8x8_t __a, const int __b)
+{
+  return (int8_t)__builtin_neon_vget_lanev8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vget_lane_s16 (int16x4_t __a, const int __b)
+{
+  return (int16_t)__builtin_neon_vget_lanev4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vget_lane_s32 (int32x2_t __a, const int __b)
+{
+  return (int32_t)__builtin_neon_vget_lanev2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vget_lane_f32 (float32x2_t __a, const int __b)
+{
+  return (float32_t)__builtin_neon_vget_lanev2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+vget_lane_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8_t)__builtin_neon_vget_lanev8qi ((int8x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vget_lane_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16_t)__builtin_neon_vget_lanev4hi ((int16x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vget_lane_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32_t)__builtin_neon_vget_lanev2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
+vget_lane_p8 (poly8x8_t __a, const int __b)
+{
+  return (poly8_t)__builtin_neon_vget_lanev8qi ((int8x8_t) __a, __b, 2);
+}
+
+__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
+vget_lane_p16 (poly16x4_t __a, const int __b)
+{
+  return (poly16_t)__builtin_neon_vget_lanev4hi ((int16x4_t) __a, __b, 2);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vget_lane_s64 (int64x1_t __a, const int __b)
+{
+  return (int64_t)__builtin_neon_vget_lanedi (__a, __b, 1);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vget_lane_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64_t)__builtin_neon_vget_lanedi ((int64x1_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+vgetq_lane_s8 (int8x16_t __a, const int __b)
+{
+  return (int8_t)__builtin_neon_vget_lanev16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vgetq_lane_s16 (int16x8_t __a, const int __b)
+{
+  return (int16_t)__builtin_neon_vget_lanev8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vgetq_lane_s32 (int32x4_t __a, const int __b)
+{
+  return (int32_t)__builtin_neon_vget_lanev4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vgetq_lane_f32 (float32x4_t __a, const int __b)
+{
+  return (float32_t)__builtin_neon_vget_lanev4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+vgetq_lane_u8 (uint8x16_t __a, const int __b)
+{
+  return (uint8_t)__builtin_neon_vget_lanev16qi ((int8x16_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+vgetq_lane_u16 (uint16x8_t __a, const int __b)
+{
+  return (uint16_t)__builtin_neon_vget_lanev8hi ((int16x8_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vgetq_lane_u32 (uint32x4_t __a, const int __b)
+{
+  return (uint32_t)__builtin_neon_vget_lanev4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
+vgetq_lane_p8 (poly8x16_t __a, const int __b)
+{
+  return (poly8_t)__builtin_neon_vget_lanev16qi ((int8x16_t) __a, __b, 2);
+}
+
+__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
+vgetq_lane_p16 (poly16x8_t __a, const int __b)
+{
+  return (poly16_t)__builtin_neon_vget_lanev8hi ((int16x8_t) __a, __b, 2);
+}
+
+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+vgetq_lane_s64 (int64x2_t __a, const int __b)
+{
+  return (int64_t)__builtin_neon_vget_lanev2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vgetq_lane_u64 (uint64x2_t __a, const int __b)
+{
+  return (uint64_t)__builtin_neon_vget_lanev2di ((int64x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vset_lane_s8 (int8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vset_lanev8qi ((__builtin_neon_qi) __a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vset_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vset_lanev4hi ((__builtin_neon_hi) __a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
+{
+  return (float32x2_t)__builtin_neon_vset_lanev2sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vset_lane_u8 (uint8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vset_lanev8qi ((__builtin_neon_qi) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vset_lane_u16 (uint16_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vset_lanev4hi ((__builtin_neon_hi) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vset_lane_u32 (uint32_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vset_lane_p8 (poly8_t __a, poly8x8_t __b, const int __c)
+{
+  return (poly8x8_t)__builtin_neon_vset_lanev8qi ((__builtin_neon_qi) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vset_lane_p16 (poly16_t __a, poly16x4_t __b, const int __c)
+{
+  return (poly16x4_t)__builtin_neon_vset_lanev4hi ((__builtin_neon_hi) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vset_lane_s64 (int64_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vset_lanedi ((__builtin_neon_di) __a, __b, __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vset_lane_u64 (uint64_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vset_lanedi ((__builtin_neon_di) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vsetq_lane_s8 (int8_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vset_lanev16qi ((__builtin_neon_qi) __a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vset_lanev8hi ((__builtin_neon_hi) __a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
+{
+  return (float32x4_t)__builtin_neon_vset_lanev4sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vsetq_lane_u8 (uint8_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vset_lanev16qi ((__builtin_neon_qi) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vsetq_lane_u16 (uint16_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vset_lanev8hi ((__builtin_neon_hi) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsetq_lane_u32 (uint32_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vsetq_lane_p8 (poly8_t __a, poly8x16_t __b, const int __c)
+{
+  return (poly8x16_t)__builtin_neon_vset_lanev16qi ((__builtin_neon_qi) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vsetq_lane_p16 (poly16_t __a, poly16x8_t __b, const int __c)
+{
+  return (poly16x8_t)__builtin_neon_vset_lanev8hi ((__builtin_neon_hi) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vsetq_lane_s64 (int64_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vset_lanev2di ((__builtin_neon_di) __a, __b, __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vset_lanev2di ((__builtin_neon_di) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vcreate_s8 (uint64_t __a)
+{
+  return (int8x8_t)__builtin_neon_vcreatev8qi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vcreate_s16 (uint64_t __a)
+{
+  return (int16x4_t)__builtin_neon_vcreatev4hi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcreate_s32 (uint64_t __a)
+{
+  return (int32x2_t)__builtin_neon_vcreatev2si ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcreate_s64 (uint64_t __a)
+{
+  return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vcreate_f32 (uint64_t __a)
+{
+  return (float32x2_t)__builtin_neon_vcreatev2sf ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vcreate_u8 (uint64_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vcreatev8qi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vcreate_u16 (uint64_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vcreatev4hi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcreate_u32 (uint64_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vcreatev2si ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcreate_u64 (uint64_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vcreate_p8 (uint64_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vcreatev8qi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vcreate_p16 (uint64_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vcreatev4hi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vdup_n_s8 (int8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vdup_n_s16 (int16_t __a)
+{
+  return (int16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vdup_n_s32 (int32_t __a)
+{
+  return (int32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vdup_n_f32 (float32_t __a)
+{
+  return (float32x2_t)__builtin_neon_vdup_nv2sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vdup_n_u8 (uint8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vdup_n_u16 (uint16_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vdup_n_u32 (uint32_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vdup_n_p8 (poly8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vdup_n_p16 (poly16_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vdup_n_s64 (int64_t __a)
+{
+  return (int64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vdup_n_u64 (uint64_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vdupq_n_s8 (int8_t __a)
+{
+  return (int8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vdupq_n_s16 (int16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vdupq_n_s32 (int32_t __a)
+{
+  return (int32x4_t)__builtin_neon_vdup_nv4si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vdupq_n_f32 (float32_t __a)
+{
+  return (float32x4_t)__builtin_neon_vdup_nv4sf (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vdupq_n_u8 (uint8_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vdupq_n_u16 (uint16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vdupq_n_u32 (uint32_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vdup_nv4si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vdupq_n_p8 (poly8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vdupq_n_p16 (poly16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vdupq_n_s64 (int64_t __a)
+{
+  return (int64x2_t)__builtin_neon_vdup_nv2di ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vdupq_n_u64 (uint64_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vdup_nv2di ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmov_n_s8 (int8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmov_n_s16 (int16_t __a)
+{
+  return (int16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmov_n_s32 (int32_t __a)
+{
+  return (int32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmov_n_f32 (float32_t __a)
+{
+  return (float32x2_t)__builtin_neon_vdup_nv2sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmov_n_u8 (uint8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmov_n_u16 (uint16_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmov_n_u32 (uint32_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vmov_n_p8 (poly8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vmov_n_p16 (poly16_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vmov_n_s64 (int64_t __a)
+{
+  return (int64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vmov_n_u64 (uint64_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vmovq_n_s8 (int8_t __a)
+{
+  return (int8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmovq_n_s16 (int16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmovq_n_s32 (int32_t __a)
+{
+  return (int32x4_t)__builtin_neon_vdup_nv4si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmovq_n_f32 (float32_t __a)
+{
+  return (float32x4_t)__builtin_neon_vdup_nv4sf (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vmovq_n_u8 (uint8_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmovq_n_u16 (uint16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmovq_n_u32 (uint32_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vdup_nv4si ((__builtin_neon_si) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vmovq_n_p8 (poly8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vdup_nv16qi ((__builtin_neon_qi) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vmovq_n_p16 (poly16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vdup_nv8hi ((__builtin_neon_hi) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmovq_n_s64 (int64_t __a)
+{
+  return (int64x2_t)__builtin_neon_vdup_nv2di ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmovq_n_u64 (uint64_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vdup_nv2di ((__builtin_neon_di) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vdup_lane_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x8_t)__builtin_neon_vdup_lanev8qi (__a, __b);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vdup_lane_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x4_t)__builtin_neon_vdup_lanev4hi (__a, __b);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vdup_lane_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vdup_lanev2si (__a, __b);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vdup_lane_f32 (float32x2_t __a, const int __b)
+{
+  return (float32x2_t)__builtin_neon_vdup_lanev2sf (__a, __b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vdup_lane_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x8_t)__builtin_neon_vdup_lanev8qi ((int8x8_t) __a, __b);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vdup_lane_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x4_t)__builtin_neon_vdup_lanev4hi ((int16x4_t) __a, __b);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vdup_lane_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vdup_lanev2si ((int32x2_t) __a, __b);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vdup_lane_p8 (poly8x8_t __a, const int __b)
+{
+  return (poly8x8_t)__builtin_neon_vdup_lanev8qi ((int8x8_t) __a, __b);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vdup_lane_p16 (poly16x4_t __a, const int __b)
+{
+  return (poly16x4_t)__builtin_neon_vdup_lanev4hi ((int16x4_t) __a, __b);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vdup_lane_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x1_t)__builtin_neon_vdup_lanedi (__a, __b);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vdup_lane_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x1_t)__builtin_neon_vdup_lanedi ((int64x1_t) __a, __b);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vdupq_lane_s8 (int8x8_t __a, const int __b)
+{
+  return (int8x16_t)__builtin_neon_vdup_lanev16qi (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_s16 (int16x4_t __a, const int __b)
+{
+  return (int16x8_t)__builtin_neon_vdup_lanev8hi (__a, __b);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vdupq_lane_s32 (int32x2_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vdup_lanev4si (__a, __b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vdupq_lane_f32 (float32x2_t __a, const int __b)
+{
+  return (float32x4_t)__builtin_neon_vdup_lanev4sf (__a, __b);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vdupq_lane_u8 (uint8x8_t __a, const int __b)
+{
+  return (uint8x16_t)__builtin_neon_vdup_lanev16qi ((int8x8_t) __a, __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_u16 (uint16x4_t __a, const int __b)
+{
+  return (uint16x8_t)__builtin_neon_vdup_lanev8hi ((int16x4_t) __a, __b);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vdupq_lane_u32 (uint32x2_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vdup_lanev4si ((int32x2_t) __a, __b);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vdupq_lane_p8 (poly8x8_t __a, const int __b)
+{
+  return (poly8x16_t)__builtin_neon_vdup_lanev16qi ((int8x8_t) __a, __b);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_p16 (poly16x4_t __a, const int __b)
+{
+  return (poly16x8_t)__builtin_neon_vdup_lanev8hi ((int16x4_t) __a, __b);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vdupq_lane_s64 (int64x1_t __a, const int __b)
+{
+  return (int64x2_t)__builtin_neon_vdup_lanev2di (__a, __b);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vdupq_lane_u64 (uint64x1_t __a, const int __b)
+{
+  return (uint64x2_t)__builtin_neon_vdup_lanev2di ((int64x1_t) __a, __b);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vcombine_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x16_t)__builtin_neon_vcombinev8qi (__a, __b);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vcombine_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x8_t)__builtin_neon_vcombinev4hi (__a, __b);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vcombine_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x4_t)__builtin_neon_vcombinev2si (__a, __b);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vcombine_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x2_t)__builtin_neon_vcombinedi (__a, __b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcombine_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return (float32x4_t)__builtin_neon_vcombinev2sf (__a, __b);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vcombine_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vcombinev8qi ((int8x8_t) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vcombine_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vcombinev4hi ((int16x4_t) __a, (int16x4_t) __b);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcombine_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vcombinev2si ((int32x2_t) __a, (int32x2_t) __b);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vcombine_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vcombinedi ((int64x1_t) __a, (int64x1_t) __b);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vcombine_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  return (poly8x16_t)__builtin_neon_vcombinev8qi ((int8x8_t) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vcombine_p16 (poly16x4_t __a, poly16x4_t __b)
+{
+  return (poly16x8_t)__builtin_neon_vcombinev4hi ((int16x4_t) __a, (int16x4_t) __b);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vget_high_s8 (int8x16_t __a)
+{
+  return (int8x8_t)__builtin_neon_vget_highv16qi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vget_high_s16 (int16x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vget_highv8hi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vget_high_s32 (int32x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vget_highv4si (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vget_high_s64 (int64x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vget_highv2di (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vget_high_f32 (float32x4_t __a)
+{
+  return (float32x2_t)__builtin_neon_vget_highv4sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vget_high_u8 (uint8x16_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vget_highv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vget_high_u16 (uint16x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vget_highv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vget_high_u32 (uint32x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vget_highv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vget_high_u64 (uint64x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vget_highv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vget_high_p8 (poly8x16_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vget_highv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vget_high_p16 (poly16x8_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vget_highv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vget_low_s8 (int8x16_t __a)
+{
+  return (int8x8_t)__builtin_neon_vget_lowv16qi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vget_low_s16 (int16x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vget_lowv8hi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vget_low_s32 (int32x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vget_lowv4si (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vget_low_s64 (int64x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vget_lowv2di (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vget_low_f32 (float32x4_t __a)
+{
+  return (float32x2_t)__builtin_neon_vget_lowv4sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vget_low_u8 (uint8x16_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vget_lowv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vget_low_u16 (uint16x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vget_lowv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vget_low_u32 (uint32x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vget_lowv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vget_low_u64 (uint64x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vget_lowv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vget_low_p8 (poly8x16_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vget_lowv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vget_low_p16 (poly16x8_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vget_lowv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcvt_s32_f32 (float32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vcvtv2sf (__a, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vcvt_f32_s32 (int32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vcvtv2si (__a, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vcvt_f32_u32 (uint32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vcvtv2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcvt_u32_f32 (float32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vcvtv2sf (__a, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vcvtq_s32_f32 (float32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vcvtv4sf (__a, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvtq_f32_s32 (int32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vcvtv4si (__a, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvtq_f32_u32 (uint32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vcvtv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcvtq_u32_f32 (float32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vcvtv4sf (__a, 0);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcvt_n_s32_f32 (float32x2_t __a, const int __b)
+{
+  return (int32x2_t)__builtin_neon_vcvt_nv2sf (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vcvt_n_f32_s32 (int32x2_t __a, const int __b)
+{
+  return (float32x2_t)__builtin_neon_vcvt_nv2si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vcvt_n_f32_u32 (uint32x2_t __a, const int __b)
+{
+  return (float32x2_t)__builtin_neon_vcvt_nv2si ((int32x2_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcvt_n_u32_f32 (float32x2_t __a, const int __b)
+{
+  return (uint32x2_t)__builtin_neon_vcvt_nv2sf (__a, __b, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vcvtq_n_s32_f32 (float32x4_t __a, const int __b)
+{
+  return (int32x4_t)__builtin_neon_vcvt_nv4sf (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvtq_n_f32_s32 (int32x4_t __a, const int __b)
+{
+  return (float32x4_t)__builtin_neon_vcvt_nv4si (__a, __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvtq_n_f32_u32 (uint32x4_t __a, const int __b)
+{
+  return (float32x4_t)__builtin_neon_vcvt_nv4si ((int32x4_t) __a, __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcvtq_n_u32_f32 (float32x4_t __a, const int __b)
+{
+  return (uint32x4_t)__builtin_neon_vcvt_nv4sf (__a, __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vmovn_s16 (int16x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vmovnv8hi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmovn_s32 (int32x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vmovnv4si (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmovn_s64 (int64x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vmovnv2di (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vmovn_u16 (uint16x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vmovnv8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmovn_u32 (uint32x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vmovnv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmovn_u64 (uint64x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vmovnv2di ((int64x2_t) __a, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqmovn_s16 (int16x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vqmovnv8hi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqmovn_s32 (int32x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vqmovnv4si (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqmovn_s64 (int64x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vqmovnv2di (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqmovn_u16 (uint16x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vqmovnv8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqmovn_u32 (uint32x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vqmovnv4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqmovn_u64 (uint64x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vqmovnv2di ((int64x2_t) __a, 0);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqmovun_s16 (int16x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vqmovunv8hi (__a, 1);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vqmovun_s32 (int32x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vqmovunv4si (__a, 1);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vqmovun_s64 (int64x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vqmovunv2di (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmovl_s8 (int8x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vmovlv8qi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmovl_s16 (int16x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vmovlv4hi (__a, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmovl_s32 (int32x2_t __a)
+{
+  return (int64x2_t)__builtin_neon_vmovlv2si (__a, 1);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmovl_u8 (uint8x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vmovlv8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmovl_u16 (uint16x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vmovlv4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmovl_u32 (uint32x2_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vmovlv2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbl1_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vtbl1v8qi (__a, __b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbl1_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vtbl1v8qi ((int8x8_t) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbl1_p8 (poly8x8_t __a, uint8x8_t __b)
+{
+  return (poly8x8_t)__builtin_neon_vtbl1v8qi ((int8x8_t) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbl2_s8 (int8x8x2_t __a, int8x8_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __au = { __a };
+  return (int8x8_t)__builtin_neon_vtbl2v8qi (__au.__o, __b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbl2_u8 (uint8x8x2_t __a, uint8x8_t __b)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __au = { __a };
+  return (uint8x8_t)__builtin_neon_vtbl2v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbl2_p8 (poly8x8x2_t __a, uint8x8_t __b)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __au = { __a };
+  return (poly8x8_t)__builtin_neon_vtbl2v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbl3_s8 (int8x8x3_t __a, int8x8_t __b)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __au = { __a };
+  return (int8x8_t)__builtin_neon_vtbl3v8qi (__au.__o, __b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbl3_u8 (uint8x8x3_t __a, uint8x8_t __b)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __au = { __a };
+  return (uint8x8_t)__builtin_neon_vtbl3v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbl3_p8 (poly8x8x3_t __a, uint8x8_t __b)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __au = { __a };
+  return (poly8x8_t)__builtin_neon_vtbl3v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbl4_s8 (int8x8x4_t __a, int8x8_t __b)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __au = { __a };
+  return (int8x8_t)__builtin_neon_vtbl4v8qi (__au.__o, __b);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbl4_u8 (uint8x8x4_t __a, uint8x8_t __b)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __au = { __a };
+  return (uint8x8_t)__builtin_neon_vtbl4v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbl4_p8 (poly8x8x4_t __a, uint8x8_t __b)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __au = { __a };
+  return (poly8x8_t)__builtin_neon_vtbl4v8qi (__au.__o, (int8x8_t) __b);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbx1_s8 (int8x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int8x8_t)__builtin_neon_vtbx1v8qi (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbx1_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint8x8_t)__builtin_neon_vtbx1v8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbx1_p8 (poly8x8_t __a, poly8x8_t __b, uint8x8_t __c)
+{
+  return (poly8x8_t)__builtin_neon_vtbx1v8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbx2_s8 (int8x8_t __a, int8x8x2_t __b, int8x8_t __c)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  return (int8x8_t)__builtin_neon_vtbx2v8qi (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbx2_u8 (uint8x8_t __a, uint8x8x2_t __b, uint8x8_t __c)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  return (uint8x8_t)__builtin_neon_vtbx2v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbx2_p8 (poly8x8_t __a, poly8x8x2_t __b, uint8x8_t __c)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  return (poly8x8_t)__builtin_neon_vtbx2v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbx3_s8 (int8x8_t __a, int8x8x3_t __b, int8x8_t __c)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  return (int8x8_t)__builtin_neon_vtbx3v8qi (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbx3_u8 (uint8x8_t __a, uint8x8x3_t __b, uint8x8_t __c)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  return (uint8x8_t)__builtin_neon_vtbx3v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbx3_p8 (poly8x8_t __a, poly8x8x3_t __b, uint8x8_t __c)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  return (poly8x8_t)__builtin_neon_vtbx3v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vtbx4_s8 (int8x8_t __a, int8x8x4_t __b, int8x8_t __c)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  return (int8x8_t)__builtin_neon_vtbx4v8qi (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vtbx4_u8 (uint8x8_t __a, uint8x8x4_t __b, uint8x8_t __c)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  return (uint8x8_t)__builtin_neon_vtbx4v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vtbx4_p8 (poly8x8_t __a, poly8x8x4_t __b, uint8x8_t __c)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  return (poly8x8_t)__builtin_neon_vtbx4v8qi ((int8x8_t) __a, __bu.__o, (int8x8_t) __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmul_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vmul_lanev4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmul_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vmul_lanev2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmul_lane_f32 (float32x2_t __a, float32x2_t __b, const int __c)
+{
+  return (float32x2_t)__builtin_neon_vmul_lanev2sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmul_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vmul_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmul_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vmul_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vmul_lanev8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmulq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vmul_lanev4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmulq_lane_f32 (float32x4_t __a, float32x2_t __b, const int __c)
+{
+  return (float32x4_t)__builtin_neon_vmul_lanev4sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmulq_lane_u16 (uint16x8_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vmul_lanev8hi ((int16x8_t) __a, (int16x4_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmulq_lane_u32 (uint32x4_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vmul_lanev4si ((int32x4_t) __a, (int32x2_t) __b, __c, 0);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmla_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vmla_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmla_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vmla_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmla_lane_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c, const int __d)
+{
+  return (float32x2_t)__builtin_neon_vmla_lanev2sf (__a, __b, __c, __d, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmla_lane_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint16x4_t)__builtin_neon_vmla_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmla_lane_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint32x2_t)__builtin_neon_vmla_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlaq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vmla_lanev8hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlaq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vmla_lanev4si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlaq_lane_f32 (float32x4_t __a, float32x4_t __b, float32x2_t __c, const int __d)
+{
+  return (float32x4_t)__builtin_neon_vmla_lanev4sf (__a, __b, __c, __d, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlaq_lane_u16 (uint16x8_t __a, uint16x8_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint16x8_t)__builtin_neon_vmla_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlaq_lane_u32 (uint32x4_t __a, uint32x4_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint32x4_t)__builtin_neon_vmla_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vmlal_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int64x2_t)__builtin_neon_vmlal_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlal_lane_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint32x4_t)__builtin_neon_vmlal_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlal_lane_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint64x2_t)__builtin_neon_vmlal_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqdmlal_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int64x2_t)__builtin_neon_vqdmlal_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmls_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vmls_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmls_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vmls_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmls_lane_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c, const int __d)
+{
+  return (float32x2_t)__builtin_neon_vmls_lanev2sf (__a, __b, __c, __d, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmls_lane_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint16x4_t)__builtin_neon_vmls_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmls_lane_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint32x2_t)__builtin_neon_vmls_lanev2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlsq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vmls_lanev8hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vmls_lanev4si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlsq_lane_f32 (float32x4_t __a, float32x4_t __b, float32x2_t __c, const int __d)
+{
+  return (float32x4_t)__builtin_neon_vmls_lanev4sf (__a, __b, __c, __d, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlsq_lane_u16 (uint16x8_t __a, uint16x8_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint16x8_t)__builtin_neon_vmls_lanev8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsq_lane_u32 (uint32x4_t __a, uint32x4_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint32x4_t)__builtin_neon_vmls_lanev4si ((int32x4_t) __a, (int32x4_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vmlsl_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int64x2_t)__builtin_neon_vmlsl_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsl_lane_u16 (uint32x4_t __a, uint16x4_t __b, uint16x4_t __c, const int __d)
+{
+  return (uint32x4_t)__builtin_neon_vmlsl_lanev4hi ((int32x4_t) __a, (int16x4_t) __b, (int16x4_t) __c, __d, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlsl_lane_u32 (uint64x2_t __a, uint32x2_t __b, uint32x2_t __c, const int __d)
+{
+  return (uint64x2_t)__builtin_neon_vmlsl_lanev2si ((int64x2_t) __a, (int32x2_t) __b, (int32x2_t) __c, __d, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqdmlsl_lanev4hi (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int64x2_t)__builtin_neon_vqdmlsl_lanev2si (__a, __b, __c, __d, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmull_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vmull_lanev4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmull_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vmull_lanev2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmull_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vmull_lanev4hi ((int16x4_t) __a, (int16x4_t) __b, __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmull_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vmull_lanev2si ((int32x2_t) __a, (int32x2_t) __b, __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmull_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmull_lanev4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmull_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vqdmull_lanev2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vqdmulh_lanev8hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmulh_lanev4si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vqdmulh_lanev4hi (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vqdmulh_lanev2si (__a, __b, __c, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vqdmulh_lanev8hi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmulh_lanev4si (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vqdmulh_lanev4hi (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vqdmulh_lanev2si (__a, __b, __c, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmul_n_s16 (int16x4_t __a, int16_t __b)
+{
+  return (int16x4_t)__builtin_neon_vmul_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmul_n_s32 (int32x2_t __a, int32_t __b)
+{
+  return (int32x2_t)__builtin_neon_vmul_nv2si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmul_n_f32 (float32x2_t __a, float32_t __b)
+{
+  return (float32x2_t)__builtin_neon_vmul_nv2sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmul_n_u16 (uint16x4_t __a, uint16_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vmul_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmul_n_u32 (uint32x2_t __a, uint32_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vmul_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmulq_n_s16 (int16x8_t __a, int16_t __b)
+{
+  return (int16x8_t)__builtin_neon_vmul_nv8hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmulq_n_s32 (int32x4_t __a, int32_t __b)
+{
+  return (int32x4_t)__builtin_neon_vmul_nv4si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmulq_n_f32 (float32x4_t __a, float32_t __b)
+{
+  return (float32x4_t)__builtin_neon_vmul_nv4sf (__a, __b, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vmul_nv8hi ((int16x8_t) __a, (__builtin_neon_hi) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vmul_nv4si ((int32x4_t) __a, (__builtin_neon_si) __b, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmull_n_s16 (int16x4_t __a, int16_t __b)
+{
+  return (int32x4_t)__builtin_neon_vmull_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmull_n_s32 (int32x2_t __a, int32_t __b)
+{
+  return (int64x2_t)__builtin_neon_vmull_nv2si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmull_n_u16 (uint16x4_t __a, uint16_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vmull_nv4hi ((int16x4_t) __a, (__builtin_neon_hi) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmull_n_u32 (uint32x2_t __a, uint32_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vmull_nv2si ((int32x2_t) __a, (__builtin_neon_si) __b, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmull_n_s16 (int16x4_t __a, int16_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmull_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmull_n_s32 (int32x2_t __a, int32_t __b)
+{
+  return (int64x2_t)__builtin_neon_vqdmull_nv2si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqdmulhq_n_s16 (int16x8_t __a, int16_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqdmulh_nv8hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmulhq_n_s32 (int32x4_t __a, int32_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmulh_nv4si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqdmulh_n_s16 (int16x4_t __a, int16_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqdmulh_nv4hi (__a, (__builtin_neon_hi) __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqdmulh_n_s32 (int32x2_t __a, int32_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqdmulh_nv2si (__a, (__builtin_neon_si) __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmulhq_n_s16 (int16x8_t __a, int16_t __b)
+{
+  return (int16x8_t)__builtin_neon_vqdmulh_nv8hi (__a, (__builtin_neon_hi) __b, 5);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmulhq_n_s32 (int32x4_t __a, int32_t __b)
+{
+  return (int32x4_t)__builtin_neon_vqdmulh_nv4si (__a, (__builtin_neon_si) __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmulh_n_s16 (int16x4_t __a, int16_t __b)
+{
+  return (int16x4_t)__builtin_neon_vqdmulh_nv4hi (__a, (__builtin_neon_hi) __b, 5);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmulh_n_s32 (int32x2_t __a, int32_t __b)
+{
+  return (int32x2_t)__builtin_neon_vqdmulh_nv2si (__a, (__builtin_neon_si) __b, 5);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmla_n_s16 (int16x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int16x4_t)__builtin_neon_vmla_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmla_n_s32 (int32x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int32x2_t)__builtin_neon_vmla_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmla_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
+{
+  return (float32x2_t)__builtin_neon_vmla_nv2sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmla_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vmla_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmla_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vmla_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlaq_n_s16 (int16x8_t __a, int16x8_t __b, int16_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmla_nv8hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlaq_n_s32 (int32x4_t __a, int32x4_t __b, int32_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmla_nv4si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
+{
+  return (float32x4_t)__builtin_neon_vmla_nv4sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlaq_n_u16 (uint16x8_t __a, uint16x8_t __b, uint16_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmla_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlaq_n_u32 (uint32x4_t __a, uint32x4_t __b, uint32_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmla_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlal_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int64x2_t)__builtin_neon_vmlal_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlal_n_u16 (uint32x4_t __a, uint16x4_t __b, uint16_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlal_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlal_n_u32 (uint64x2_t __a, uint32x2_t __b, uint32_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vmlal_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmlal_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int64x2_t)__builtin_neon_vqdmlal_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vmls_n_s16 (int16x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int16x4_t)__builtin_neon_vmls_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vmls_n_s32 (int32x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int32x2_t)__builtin_neon_vmls_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmls_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
+{
+  return (float32x2_t)__builtin_neon_vmls_nv2sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vmls_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vmls_nv4hi ((int16x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vmls_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vmls_nv2si ((int32x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vmlsq_n_s16 (int16x8_t __a, int16x8_t __b, int16_t __c)
+{
+  return (int16x8_t)__builtin_neon_vmls_nv8hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsq_n_s32 (int32x4_t __a, int32x4_t __b, int32_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmls_nv4si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmlsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
+{
+  return (float32x4_t)__builtin_neon_vmls_nv4sf (__a, __b, __c, 3);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vmlsq_n_u16 (uint16x8_t __a, uint16x8_t __b, uint16_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vmls_nv8hi ((int16x8_t) __a, (int16x8_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsq_n_u32 (uint32x4_t __a, uint32x4_t __b, uint32_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmls_nv4si ((int32x4_t) __a, (int32x4_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int32x4_t)__builtin_neon_vmlsl_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int64x2_t)__builtin_neon_vmlsl_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vmlsl_n_u16 (uint32x4_t __a, uint16x4_t __b, uint16_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vmlsl_nv4hi ((int32x4_t) __a, (int16x4_t) __b, (__builtin_neon_hi) __c, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vmlsl_n_u32 (uint64x2_t __a, uint32x2_t __b, uint32_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vmlsl_nv2si ((int64x2_t) __a, (int32x2_t) __b, (__builtin_neon_si) __c, 0);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqdmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqdmlsl_nv4hi (__a, __b, (__builtin_neon_hi) __c, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vqdmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+{
+  return (int64x2_t)__builtin_neon_vqdmlsl_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vext_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vextv8qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vext_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vextv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vext_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vextv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vext_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vextdi (__a, __b, __c);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vext_f32 (float32x2_t __a, float32x2_t __b, const int __c)
+{
+  return (float32x2_t)__builtin_neon_vextv2sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vext_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vextv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vext_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vextv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vext_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vextv2si ((int32x2_t) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vext_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vextdi ((int64x1_t) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vext_p8 (poly8x8_t __a, poly8x8_t __b, const int __c)
+{
+  return (poly8x8_t)__builtin_neon_vextv8qi ((int8x8_t) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vext_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
+{
+  return (poly16x4_t)__builtin_neon_vextv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vextq_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vextv16qi (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vextq_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vextv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vextq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vextv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vextq_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vextv2di (__a, __b, __c);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vextq_f32 (float32x4_t __a, float32x4_t __b, const int __c)
+{
+  return (float32x4_t)__builtin_neon_vextv4sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vextq_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vextv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vextq_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vextv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vextq_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vextv4si ((int32x4_t) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vextq_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vextv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vextq_p8 (poly8x16_t __a, poly8x16_t __b, const int __c)
+{
+  return (poly8x16_t)__builtin_neon_vextv16qi ((int8x16_t) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vextq_p16 (poly16x8_t __a, poly16x8_t __b, const int __c)
+{
+  return (poly16x8_t)__builtin_neon_vextv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrev64_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vrev64v8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrev64_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vrev64v4hi (__a, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vrev64_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vrev64v2si (__a, 1);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vrev64_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vrev64v2sf (__a, 3);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrev64_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vrev64v8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrev64_u16 (uint16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vrev64v4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vrev64_u32 (uint32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vrev64v2si ((int32x2_t) __a, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vrev64_p8 (poly8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vrev64v8qi ((int8x8_t) __a, 2);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vrev64_p16 (poly16x4_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vrev64v4hi ((int16x4_t) __a, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrev64q_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vrev64v16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrev64q_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vrev64v8hi (__a, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vrev64q_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vrev64v4si (__a, 1);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vrev64q_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vrev64v4sf (__a, 3);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrev64q_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vrev64v16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrev64q_u16 (uint16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vrev64v8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vrev64q_u32 (uint32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vrev64v4si ((int32x4_t) __a, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vrev64q_p8 (poly8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vrev64v16qi ((int8x16_t) __a, 2);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vrev64q_p16 (poly16x8_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vrev64v8hi ((int16x8_t) __a, 2);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrev32_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vrev32v8qi (__a, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vrev32_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vrev32v4hi (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrev32_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vrev32v8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vrev32_u16 (uint16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vrev32v4hi ((int16x4_t) __a, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vrev32_p8 (poly8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vrev32v8qi ((int8x8_t) __a, 2);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vrev32_p16 (poly16x4_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vrev32v4hi ((int16x4_t) __a, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrev32q_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vrev32v16qi (__a, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vrev32q_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vrev32v8hi (__a, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrev32q_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vrev32v16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vrev32q_u16 (uint16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vrev32v8hi ((int16x8_t) __a, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vrev32q_p8 (poly8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vrev32v16qi ((int8x16_t) __a, 2);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vrev32q_p16 (poly16x8_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vrev32v8hi ((int16x8_t) __a, 2);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vrev16_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vrev16v8qi (__a, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vrev16_u8 (uint8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vrev16v8qi ((int8x8_t) __a, 0);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vrev16_p8 (poly8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vrev16v8qi ((int8x8_t) __a, 2);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vrev16q_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vrev16v16qi (__a, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vrev16q_u8 (uint8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vrev16v16qi ((int8x16_t) __a, 0);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vrev16q_p8 (poly8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vrev16v16qi ((int8x16_t) __a, 2);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vbsl_s8 (uint8x8_t __a, int8x8_t __b, int8x8_t __c)
+{
+  return (int8x8_t)__builtin_neon_vbslv8qi ((int8x8_t) __a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vbsl_s16 (uint16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vbslv4hi ((int16x4_t) __a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vbsl_s32 (uint32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vbslv2si ((int32x2_t) __a, __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vbsl_s64 (uint64x1_t __a, int64x1_t __b, int64x1_t __c)
+{
+  return (int64x1_t)__builtin_neon_vbsldi ((int64x1_t) __a, __b, __c);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c)
+{
+  return (float32x2_t)__builtin_neon_vbslv2sf ((int32x2_t) __a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vbsl_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
+{
+  return (uint8x8_t)__builtin_neon_vbslv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vbsl_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
+{
+  return (uint16x4_t)__builtin_neon_vbslv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vbsl_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
+{
+  return (uint32x2_t)__builtin_neon_vbslv2si ((int32x2_t) __a, (int32x2_t) __b, (int32x2_t) __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c)
+{
+  return (uint64x1_t)__builtin_neon_vbsldi ((int64x1_t) __a, (int64x1_t) __b, (int64x1_t) __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vbsl_p8 (uint8x8_t __a, poly8x8_t __b, poly8x8_t __c)
+{
+  return (poly8x8_t)__builtin_neon_vbslv8qi ((int8x8_t) __a, (int8x8_t) __b, (int8x8_t) __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c)
+{
+  return (poly16x4_t)__builtin_neon_vbslv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vbslq_s8 (uint8x16_t __a, int8x16_t __b, int8x16_t __c)
+{
+  return (int8x16_t)__builtin_neon_vbslv16qi ((int8x16_t) __a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vbslq_s16 (uint16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vbslv8hi ((int16x8_t) __a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vbslq_s32 (uint32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vbslv4si ((int32x4_t) __a, __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vbslq_s64 (uint64x2_t __a, int64x2_t __b, int64x2_t __c)
+{
+  return (int64x2_t)__builtin_neon_vbslv2di ((int64x2_t) __a, __b, __c);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c)
+{
+  return (float32x4_t)__builtin_neon_vbslv4sf ((int32x4_t) __a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vbslq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
+{
+  return (uint8x16_t)__builtin_neon_vbslv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vbslq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
+{
+  return (uint16x8_t)__builtin_neon_vbslv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vbslq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
+{
+  return (uint32x4_t)__builtin_neon_vbslv4si ((int32x4_t) __a, (int32x4_t) __b, (int32x4_t) __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
+{
+  return (uint64x2_t)__builtin_neon_vbslv2di ((int64x2_t) __a, (int64x2_t) __b, (int64x2_t) __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vbslq_p8 (uint8x16_t __a, poly8x16_t __b, poly8x16_t __c)
+{
+  return (poly8x16_t)__builtin_neon_vbslv16qi ((int8x16_t) __a, (int8x16_t) __b, (int8x16_t) __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vbslq_p16 (uint16x8_t __a, poly16x8_t __b, poly16x8_t __c)
+{
+  return (poly16x8_t)__builtin_neon_vbslv8hi ((int16x8_t) __a, (int16x8_t) __b, (int16x8_t) __c);
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vtrn_s8 (int8x8_t __a, int8x8_t __b)
+{
+  int8x8x2_t __rv;
+  __builtin_neon_vtrnv8qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vtrn_s16 (int16x4_t __a, int16x4_t __b)
+{
+  int16x4x2_t __rv;
+  __builtin_neon_vtrnv4hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vtrn_s32 (int32x2_t __a, int32x2_t __b)
+{
+  int32x2x2_t __rv;
+  __builtin_neon_vtrnv2si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vtrn_f32 (float32x2_t __a, float32x2_t __b)
+{
+  float32x2x2_t __rv;
+  __builtin_neon_vtrnv2sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vtrn_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  uint8x8x2_t __rv;
+  __builtin_neon_vtrnv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vtrn_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  uint16x4x2_t __rv;
+  __builtin_neon_vtrnv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vtrn_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  uint32x2x2_t __rv;
+  __builtin_neon_vtrnv2si ((int32x2_t *) &__rv.val[0], (int32x2_t) __a, (int32x2_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vtrn_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  poly8x8x2_t __rv;
+  __builtin_neon_vtrnv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vtrn_p16 (poly16x4_t __a, poly16x4_t __b)
+{
+  poly16x4x2_t __rv;
+  __builtin_neon_vtrnv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+vtrnq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  int8x16x2_t __rv;
+  __builtin_neon_vtrnv16qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  int16x8x2_t __rv;
+  __builtin_neon_vtrnv8hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+vtrnq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  int32x4x2_t __rv;
+  __builtin_neon_vtrnv4si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+vtrnq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  float32x4x2_t __rv;
+  __builtin_neon_vtrnv4sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+vtrnq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  uint8x16x2_t __rv;
+  __builtin_neon_vtrnv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  uint16x8x2_t __rv;
+  __builtin_neon_vtrnv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+vtrnq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  uint32x4x2_t __rv;
+  __builtin_neon_vtrnv4si ((int32x4_t *) &__rv.val[0], (int32x4_t) __a, (int32x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+vtrnq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  poly8x16x2_t __rv;
+  __builtin_neon_vtrnv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+vtrnq_p16 (poly16x8_t __a, poly16x8_t __b)
+{
+  poly16x8x2_t __rv;
+  __builtin_neon_vtrnv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vzip_s8 (int8x8_t __a, int8x8_t __b)
+{
+  int8x8x2_t __rv;
+  __builtin_neon_vzipv8qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vzip_s16 (int16x4_t __a, int16x4_t __b)
+{
+  int16x4x2_t __rv;
+  __builtin_neon_vzipv4hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vzip_s32 (int32x2_t __a, int32x2_t __b)
+{
+  int32x2x2_t __rv;
+  __builtin_neon_vzipv2si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vzip_f32 (float32x2_t __a, float32x2_t __b)
+{
+  float32x2x2_t __rv;
+  __builtin_neon_vzipv2sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vzip_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  uint8x8x2_t __rv;
+  __builtin_neon_vzipv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vzip_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  uint16x4x2_t __rv;
+  __builtin_neon_vzipv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vzip_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  uint32x2x2_t __rv;
+  __builtin_neon_vzipv2si ((int32x2_t *) &__rv.val[0], (int32x2_t) __a, (int32x2_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vzip_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  poly8x8x2_t __rv;
+  __builtin_neon_vzipv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vzip_p16 (poly16x4_t __a, poly16x4_t __b)
+{
+  poly16x4x2_t __rv;
+  __builtin_neon_vzipv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+vzipq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  int8x16x2_t __rv;
+  __builtin_neon_vzipv16qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+vzipq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  int16x8x2_t __rv;
+  __builtin_neon_vzipv8hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+vzipq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  int32x4x2_t __rv;
+  __builtin_neon_vzipv4si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+vzipq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  float32x4x2_t __rv;
+  __builtin_neon_vzipv4sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+vzipq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  uint8x16x2_t __rv;
+  __builtin_neon_vzipv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+vzipq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  uint16x8x2_t __rv;
+  __builtin_neon_vzipv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+vzipq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  uint32x4x2_t __rv;
+  __builtin_neon_vzipv4si ((int32x4_t *) &__rv.val[0], (int32x4_t) __a, (int32x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+vzipq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  poly8x16x2_t __rv;
+  __builtin_neon_vzipv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+vzipq_p16 (poly16x8_t __a, poly16x8_t __b)
+{
+  poly16x8x2_t __rv;
+  __builtin_neon_vzipv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vuzp_s8 (int8x8_t __a, int8x8_t __b)
+{
+  int8x8x2_t __rv;
+  __builtin_neon_vuzpv8qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vuzp_s16 (int16x4_t __a, int16x4_t __b)
+{
+  int16x4x2_t __rv;
+  __builtin_neon_vuzpv4hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vuzp_s32 (int32x2_t __a, int32x2_t __b)
+{
+  int32x2x2_t __rv;
+  __builtin_neon_vuzpv2si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vuzp_f32 (float32x2_t __a, float32x2_t __b)
+{
+  float32x2x2_t __rv;
+  __builtin_neon_vuzpv2sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vuzp_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  uint8x8x2_t __rv;
+  __builtin_neon_vuzpv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vuzp_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  uint16x4x2_t __rv;
+  __builtin_neon_vuzpv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vuzp_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  uint32x2x2_t __rv;
+  __builtin_neon_vuzpv2si ((int32x2_t *) &__rv.val[0], (int32x2_t) __a, (int32x2_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vuzp_p8 (poly8x8_t __a, poly8x8_t __b)
+{
+  poly8x8x2_t __rv;
+  __builtin_neon_vuzpv8qi ((int8x8_t *) &__rv.val[0], (int8x8_t) __a, (int8x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vuzp_p16 (poly16x4_t __a, poly16x4_t __b)
+{
+  poly16x4x2_t __rv;
+  __builtin_neon_vuzpv4hi ((int16x4_t *) &__rv.val[0], (int16x4_t) __a, (int16x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+vuzpq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  int8x16x2_t __rv;
+  __builtin_neon_vuzpv16qi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+vuzpq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  int16x8x2_t __rv;
+  __builtin_neon_vuzpv8hi (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+vuzpq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  int32x4x2_t __rv;
+  __builtin_neon_vuzpv4si (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+vuzpq_f32 (float32x4_t __a, float32x4_t __b)
+{
+  float32x4x2_t __rv;
+  __builtin_neon_vuzpv4sf (&__rv.val[0], __a, __b);
+  return __rv;
+}
+
+__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+vuzpq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  uint8x16x2_t __rv;
+  __builtin_neon_vuzpv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+vuzpq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  uint16x8x2_t __rv;
+  __builtin_neon_vuzpv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+vuzpq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  uint32x4x2_t __rv;
+  __builtin_neon_vuzpv4si ((int32x4_t *) &__rv.val[0], (int32x4_t) __a, (int32x4_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+vuzpq_p8 (poly8x16_t __a, poly8x16_t __b)
+{
+  poly8x16x2_t __rv;
+  __builtin_neon_vuzpv16qi ((int8x16_t *) &__rv.val[0], (int8x16_t) __a, (int8x16_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+vuzpq_p16 (poly16x8_t __a, poly16x8_t __b)
+{
+  poly16x8x2_t __rv;
+  __builtin_neon_vuzpv8hi ((int16x8_t *) &__rv.val[0], (int16x8_t) __a, (int16x8_t) __b);
+  return __rv;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vld1_s8 (const int8_t * __a)
+{
+  return (int8x8_t)__builtin_neon_vld1v8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vld1_s16 (const int16_t * __a)
+{
+  return (int16x4_t)__builtin_neon_vld1v4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vld1_s32 (const int32_t * __a)
+{
+  return (int32x2_t)__builtin_neon_vld1v2si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vld1_s64 (const int64_t * __a)
+{
+  return (int64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vld1_f32 (const float32_t * __a)
+{
+  return (float32x2_t)__builtin_neon_vld1v2sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vld1_u8 (const uint8_t * __a)
+{
+  return (uint8x8_t)__builtin_neon_vld1v8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vld1_u16 (const uint16_t * __a)
+{
+  return (uint16x4_t)__builtin_neon_vld1v4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vld1_u32 (const uint32_t * __a)
+{
+  return (uint32x2_t)__builtin_neon_vld1v2si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vld1_u64 (const uint64_t * __a)
+{
+  return (uint64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vld1_p8 (const poly8_t * __a)
+{
+  return (poly8x8_t)__builtin_neon_vld1v8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vld1_p16 (const poly16_t * __a)
+{
+  return (poly16x4_t)__builtin_neon_vld1v4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vld1q_s8 (const int8_t * __a)
+{
+  return (int8x16_t)__builtin_neon_vld1v16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vld1q_s16 (const int16_t * __a)
+{
+  return (int16x8_t)__builtin_neon_vld1v8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vld1q_s32 (const int32_t * __a)
+{
+  return (int32x4_t)__builtin_neon_vld1v4si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vld1q_s64 (const int64_t * __a)
+{
+  return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vld1q_f32 (const float32_t * __a)
+{
+  return (float32x4_t)__builtin_neon_vld1v4sf (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vld1q_u8 (const uint8_t * __a)
+{
+  return (uint8x16_t)__builtin_neon_vld1v16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vld1q_u16 (const uint16_t * __a)
+{
+  return (uint16x8_t)__builtin_neon_vld1v8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vld1q_u32 (const uint32_t * __a)
+{
+  return (uint32x4_t)__builtin_neon_vld1v4si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vld1q_u64 (const uint64_t * __a)
+{
+  return (uint64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vld1q_p8 (const poly8_t * __a)
+{
+  return (poly8x16_t)__builtin_neon_vld1v16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vld1q_p16 (const poly16_t * __a)
+{
+  return (poly16x8_t)__builtin_neon_vld1v8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vld1_lane_s8 (const int8_t * __a, int8x8_t __b, const int __c)
+{
+  return (int8x8_t)__builtin_neon_vld1_lanev8qi ((const __builtin_neon_qi *) __a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vld1_lane_s16 (const int16_t * __a, int16x4_t __b, const int __c)
+{
+  return (int16x4_t)__builtin_neon_vld1_lanev4hi ((const __builtin_neon_hi *) __a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
+{
+  return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
+{
+  return (float32x2_t)__builtin_neon_vld1_lanev2sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vld1_lane_u8 (const uint8_t * __a, uint8x8_t __b, const int __c)
+{
+  return (uint8x8_t)__builtin_neon_vld1_lanev8qi ((const __builtin_neon_qi *) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vld1_lane_u16 (const uint16_t * __a, uint16x4_t __b, const int __c)
+{
+  return (uint16x4_t)__builtin_neon_vld1_lanev4hi ((const __builtin_neon_hi *) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vld1_lane_u32 (const uint32_t * __a, uint32x2_t __b, const int __c)
+{
+  return (uint32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vld1_lane_p8 (const poly8_t * __a, poly8x8_t __b, const int __c)
+{
+  return (poly8x8_t)__builtin_neon_vld1_lanev8qi ((const __builtin_neon_qi *) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vld1_lane_p16 (const poly16_t * __a, poly16x4_t __b, const int __c)
+{
+  return (poly16x4_t)__builtin_neon_vld1_lanev4hi ((const __builtin_neon_hi *) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vld1_lane_s64 (const int64_t * __a, int64x1_t __b, const int __c)
+{
+  return (int64x1_t)__builtin_neon_vld1_lanedi ((const __builtin_neon_di *) __a, __b, __c);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vld1_lane_u64 (const uint64_t * __a, uint64x1_t __b, const int __c)
+{
+  return (uint64x1_t)__builtin_neon_vld1_lanedi ((const __builtin_neon_di *) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vld1q_lane_s8 (const int8_t * __a, int8x16_t __b, const int __c)
+{
+  return (int8x16_t)__builtin_neon_vld1_lanev16qi ((const __builtin_neon_qi *) __a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_s16 (const int16_t * __a, int16x8_t __b, const int __c)
+{
+  return (int16x8_t)__builtin_neon_vld1_lanev8hi ((const __builtin_neon_hi *) __a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c)
+{
+  return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
+{
+  return (float32x4_t)__builtin_neon_vld1_lanev4sf (__a, __b, __c);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vld1q_lane_u8 (const uint8_t * __a, uint8x16_t __b, const int __c)
+{
+  return (uint8x16_t)__builtin_neon_vld1_lanev16qi ((const __builtin_neon_qi *) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_u16 (const uint16_t * __a, uint16x8_t __b, const int __c)
+{
+  return (uint16x8_t)__builtin_neon_vld1_lanev8hi ((const __builtin_neon_hi *) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vld1q_lane_u32 (const uint32_t * __a, uint32x4_t __b, const int __c)
+{
+  return (uint32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vld1q_lane_p8 (const poly8_t * __a, poly8x16_t __b, const int __c)
+{
+  return (poly8x16_t)__builtin_neon_vld1_lanev16qi ((const __builtin_neon_qi *) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_p16 (const poly16_t * __a, poly16x8_t __b, const int __c)
+{
+  return (poly16x8_t)__builtin_neon_vld1_lanev8hi ((const __builtin_neon_hi *) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vld1q_lane_s64 (const int64_t * __a, int64x2_t __b, const int __c)
+{
+  return (int64x2_t)__builtin_neon_vld1_lanev2di ((const __builtin_neon_di *) __a, __b, __c);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vld1q_lane_u64 (const uint64_t * __a, uint64x2_t __b, const int __c)
+{
+  return (uint64x2_t)__builtin_neon_vld1_lanev2di ((const __builtin_neon_di *) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vld1_dup_s8 (const int8_t * __a)
+{
+  return (int8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vld1_dup_s16 (const int16_t * __a)
+{
+  return (int16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vld1_dup_s32 (const int32_t * __a)
+{
+  return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vld1_dup_f32 (const float32_t * __a)
+{
+  return (float32x2_t)__builtin_neon_vld1_dupv2sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vld1_dup_u8 (const uint8_t * __a)
+{
+  return (uint8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vld1_dup_u16 (const uint16_t * __a)
+{
+  return (uint16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vld1_dup_u32 (const uint32_t * __a)
+{
+  return (uint32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vld1_dup_p8 (const poly8_t * __a)
+{
+  return (poly8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vld1_dup_p16 (const poly16_t * __a)
+{
+  return (poly16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vld1_dup_s64 (const int64_t * __a)
+{
+  return (int64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vld1_dup_u64 (const uint64_t * __a)
+{
+  return (uint64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vld1q_dup_s8 (const int8_t * __a)
+{
+  return (int8x16_t)__builtin_neon_vld1_dupv16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_s16 (const int16_t * __a)
+{
+  return (int16x8_t)__builtin_neon_vld1_dupv8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vld1q_dup_s32 (const int32_t * __a)
+{
+  return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vld1q_dup_f32 (const float32_t * __a)
+{
+  return (float32x4_t)__builtin_neon_vld1_dupv4sf (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vld1q_dup_u8 (const uint8_t * __a)
+{
+  return (uint8x16_t)__builtin_neon_vld1_dupv16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_u16 (const uint16_t * __a)
+{
+  return (uint16x8_t)__builtin_neon_vld1_dupv8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vld1q_dup_u32 (const uint32_t * __a)
+{
+  return (uint32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vld1q_dup_p8 (const poly8_t * __a)
+{
+  return (poly8x16_t)__builtin_neon_vld1_dupv16qi ((const __builtin_neon_qi *) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_p16 (const poly16_t * __a)
+{
+  return (poly16x8_t)__builtin_neon_vld1_dupv8hi ((const __builtin_neon_hi *) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vld1q_dup_s64 (const int64_t * __a)
+{
+  return (int64x2_t)__builtin_neon_vld1_dupv2di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vld1q_dup_u64 (const uint64_t * __a)
+{
+  return (uint64x2_t)__builtin_neon_vld1_dupv2di ((const __builtin_neon_di *) __a);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_s8 (int8_t * __a, int8x8_t __b)
+{
+  __builtin_neon_vst1v8qi ((__builtin_neon_qi *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_s16 (int16_t * __a, int16x4_t __b)
+{
+  __builtin_neon_vst1v4hi ((__builtin_neon_hi *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_s32 (int32_t * __a, int32x2_t __b)
+{
+  __builtin_neon_vst1v2si ((__builtin_neon_si *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_s64 (int64_t * __a, int64x1_t __b)
+{
+  __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_f32 (float32_t * __a, float32x2_t __b)
+{
+  __builtin_neon_vst1v2sf (__a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_u8 (uint8_t * __a, uint8x8_t __b)
+{
+  __builtin_neon_vst1v8qi ((__builtin_neon_qi *) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_u16 (uint16_t * __a, uint16x4_t __b)
+{
+  __builtin_neon_vst1v4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_u32 (uint32_t * __a, uint32x2_t __b)
+{
+  __builtin_neon_vst1v2si ((__builtin_neon_si *) __a, (int32x2_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_u64 (uint64_t * __a, uint64x1_t __b)
+{
+  __builtin_neon_vst1di ((__builtin_neon_di *) __a, (int64x1_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_p8 (poly8_t * __a, poly8x8_t __b)
+{
+  __builtin_neon_vst1v8qi ((__builtin_neon_qi *) __a, (int8x8_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_p16 (poly16_t * __a, poly16x4_t __b)
+{
+  __builtin_neon_vst1v4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_s8 (int8_t * __a, int8x16_t __b)
+{
+  __builtin_neon_vst1v16qi ((__builtin_neon_qi *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_s16 (int16_t * __a, int16x8_t __b)
+{
+  __builtin_neon_vst1v8hi ((__builtin_neon_hi *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_s32 (int32_t * __a, int32x4_t __b)
+{
+  __builtin_neon_vst1v4si ((__builtin_neon_si *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_s64 (int64_t * __a, int64x2_t __b)
+{
+  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_f32 (float32_t * __a, float32x4_t __b)
+{
+  __builtin_neon_vst1v4sf (__a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_u8 (uint8_t * __a, uint8x16_t __b)
+{
+  __builtin_neon_vst1v16qi ((__builtin_neon_qi *) __a, (int8x16_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_u16 (uint16_t * __a, uint16x8_t __b)
+{
+  __builtin_neon_vst1v8hi ((__builtin_neon_hi *) __a, (int16x8_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_u32 (uint32_t * __a, uint32x4_t __b)
+{
+  __builtin_neon_vst1v4si ((__builtin_neon_si *) __a, (int32x4_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_u64 (uint64_t * __a, uint64x2_t __b)
+{
+  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_p8 (poly8_t * __a, poly8x16_t __b)
+{
+  __builtin_neon_vst1v16qi ((__builtin_neon_qi *) __a, (int8x16_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_p16 (poly16_t * __a, poly16x8_t __b)
+{
+  __builtin_neon_vst1v8hi ((__builtin_neon_hi *) __a, (int16x8_t) __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_s8 (int8_t * __a, int8x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8qi ((__builtin_neon_qi *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_s16 (int16_t * __a, int16x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4hi ((__builtin_neon_hi *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_s32 (int32_t * __a, int32x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2si ((__builtin_neon_si *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_f32 (float32_t * __a, float32x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2sf (__a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_u8 (uint8_t * __a, uint8x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8qi ((__builtin_neon_qi *) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_u16 (uint16_t * __a, uint16x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_u32 (uint32_t * __a, uint32x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2si ((__builtin_neon_si *) __a, (int32x2_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_p8 (poly8_t * __a, poly8x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8qi ((__builtin_neon_qi *) __a, (int8x8_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_p16 (poly16_t * __a, poly16x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_s64 (int64_t * __a, int64x1_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanedi ((__builtin_neon_di *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_u64 (uint64_t * __a, uint64x1_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanedi ((__builtin_neon_di *) __a, (int64x1_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_s8 (int8_t * __a, int8x16_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev16qi ((__builtin_neon_qi *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_s16 (int16_t * __a, int16x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8hi ((__builtin_neon_hi *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_s32 (int32_t * __a, int32x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4si ((__builtin_neon_si *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_f32 (float32_t * __a, float32x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4sf (__a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_u8 (uint8_t * __a, uint8x16_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev16qi ((__builtin_neon_qi *) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_u16 (uint16_t * __a, uint16x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8hi ((__builtin_neon_hi *) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_u32 (uint32_t * __a, uint32x4_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev4si ((__builtin_neon_si *) __a, (int32x4_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_p8 (poly8_t * __a, poly8x16_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev16qi ((__builtin_neon_qi *) __a, (int8x16_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_p16 (poly16_t * __a, poly16x8_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev8hi ((__builtin_neon_hi *) __a, (int16x8_t) __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_s64 (int64_t * __a, int64x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2di ((__builtin_neon_di *) __a, __b, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_u64 (uint64_t * __a, uint64x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2di ((__builtin_neon_di *) __a, (int64x2_t) __b, __c);
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vld2_s8 (const int8_t * __a)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vld2_s16 (const int16_t * __a)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vld2_s32 (const int32_t * __a)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vld2_f32 (const float32_t * __a)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vld2_u8 (const uint8_t * __a)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vld2_u16 (const uint16_t * __a)
+{
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vld2_u32 (const uint32_t * __a)
+{
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vld2_p8 (const poly8_t * __a)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vld2_p16 (const poly16_t * __a)
+{
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
+vld2_s64 (const int64_t * __a)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x2_t __attribute__ ((__always_inline__))
+vld2_u64 (const uint64_t * __a)
+{
+  union { uint64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+vld2q_s8 (const int8_t * __a)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+vld2q_s16 (const int16_t * __a)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+vld2q_s32 (const int32_t * __a)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+vld2q_f32 (const float32_t * __a)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+vld2q_u8 (const uint8_t * __a)
+{
+  union { uint8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+vld2q_u16 (const uint16_t * __a)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+vld2q_u32 (const uint32_t * __a)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+vld2q_p8 (const poly8_t * __a)
+{
+  union { poly8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+vld2q_p16 (const poly16_t * __a)
+{
+  union { poly16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vld2_lane_s8 (const int8_t * __a, int8x8x2_t __b, const int __c)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vld2_lane_s16 (const int16_t * __a, int16x4x2_t __b, const int __c)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vld2_lane_s32 (const int32_t * __a, int32x2x2_t __b, const int __c)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vld2_lane_f32 (const float32_t * __a, float32x2x2_t __b, const int __c)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev2sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vld2_lane_u8 (const uint8_t * __a, uint8x8x2_t __b, const int __c)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vld2_lane_u16 (const uint16_t * __a, uint16x4x2_t __b, const int __c)
+{
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vld2_lane_u32 (const uint32_t * __a, uint32x2x2_t __b, const int __c)
+{
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vld2_lane_p8 (const poly8_t * __a, poly8x8x2_t __b, const int __c)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vld2_lane_p16 (const poly16_t * __a, poly16x4x2_t __b, const int __c)
+{
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+vld2q_lane_s16 (const int16_t * __a, int16x8x2_t __b, const int __c)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+vld2q_lane_s32 (const int32_t * __a, int32x4x2_t __b, const int __c)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+vld2q_lane_f32 (const float32_t * __a, float32x4x2_t __b, const int __c)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+vld2q_lane_u16 (const uint16_t * __a, uint16x8x2_t __b, const int __c)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+vld2q_lane_u32 (const uint32_t * __a, uint32x4x2_t __b, const int __c)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+vld2q_lane_p16 (const poly16_t * __a, poly16x8x2_t __b, const int __c)
+{
+  union { poly16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { poly16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+vld2_dup_s8 (const int8_t * __a)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+vld2_dup_s16 (const int16_t * __a)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+vld2_dup_s32 (const int32_t * __a)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+vld2_dup_f32 (const float32_t * __a)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+vld2_dup_u8 (const uint8_t * __a)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+vld2_dup_u16 (const uint16_t * __a)
+{
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+vld2_dup_u32 (const uint32_t * __a)
+{
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+vld2_dup_p8 (const poly8_t * __a)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+vld2_dup_p16 (const poly16_t * __a)
+{
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
+vld2_dup_s64 (const int64_t * __a)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x2_t __attribute__ ((__always_inline__))
+vld2_dup_u64 (const uint64_t * __a)
+{
+  union { uint64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_s8 (int8_t * __a, int8x8x2_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_s16 (int16_t * __a, int16x4x2_t __b)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_s32 (int32_t * __a, int32x2x2_t __b)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_f32 (float32_t * __a, float32x2x2_t __b)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v2sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_u8 (uint8_t * __a, uint8x8x2_t __b)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_u16 (uint16_t * __a, uint16x4x2_t __b)
+{
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_u32 (uint32_t * __a, uint32x2x2_t __b)
+{
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_p8 (poly8_t * __a, poly8x8x2_t __b)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_p16 (poly16_t * __a, poly16x4x2_t __b)
+{
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_s64 (int64_t * __a, int64x1x2_t __b)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_u64 (uint64_t * __a, uint64x1x2_t __b)
+{
+  union { uint64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_s8 (int8_t * __a, int8x16x2_t __b)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_s16 (int16_t * __a, int16x8x2_t __b)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_s32 (int32_t * __a, int32x4x2_t __b)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_f32 (float32_t * __a, float32x4x2_t __b)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v4sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_u8 (uint8_t * __a, uint8x16x2_t __b)
+{
+  union { uint8x16x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_u16 (uint16_t * __a, uint16x8x2_t __b)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_u32 (uint32_t * __a, uint32x4x2_t __b)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_p8 (poly8_t * __a, poly8x16x2_t __b)
+{
+  union { poly8x16x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_p16 (poly16_t * __a, poly16x8x2_t __b)
+{
+  union { poly16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_s8 (int8_t * __a, int8x8x2_t __b, const int __c)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_s16 (int16_t * __a, int16x4x2_t __b, const int __c)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_s32 (int32_t * __a, int32x2x2_t __b, const int __c)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_f32 (float32_t * __a, float32x2x2_t __b, const int __c)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev2sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_u8 (uint8_t * __a, uint8x8x2_t __b, const int __c)
+{
+  union { uint8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_u16 (uint16_t * __a, uint16x4x2_t __b, const int __c)
+{
+  union { uint16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_u32 (uint32_t * __a, uint32x2x2_t __b, const int __c)
+{
+  union { uint32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_p8 (poly8_t * __a, poly8x8x2_t __b, const int __c)
+{
+  union { poly8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_lane_p16 (poly16_t * __a, poly16x4x2_t __b, const int __c)
+{
+  union { poly16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_s16 (int16_t * __a, int16x8x2_t __b, const int __c)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_s32 (int32_t * __a, int32x4x2_t __b, const int __c)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_f32 (float32_t * __a, float32x4x2_t __b, const int __c)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_u16 (uint16_t * __a, uint16x8x2_t __b, const int __c)
+{
+  union { uint16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_u32 (uint32_t * __a, uint32x4x2_t __b, const int __c)
+{
+  union { uint32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2q_lane_p16 (poly16_t * __a, poly16x8x2_t __b, const int __c)
+{
+  union { poly16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst2_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline int8x8x3_t __attribute__ ((__always_inline__))
+vld3_s8 (const int8_t * __a)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x3_t __attribute__ ((__always_inline__))
+vld3_s16 (const int16_t * __a)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x3_t __attribute__ ((__always_inline__))
+vld3_s32 (const int32_t * __a)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
+vld3_f32 (const float32_t * __a)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x3_t __attribute__ ((__always_inline__))
+vld3_u8 (const uint8_t * __a)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x3_t __attribute__ ((__always_inline__))
+vld3_u16 (const uint16_t * __a)
+{
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x3_t __attribute__ ((__always_inline__))
+vld3_u32 (const uint32_t * __a)
+{
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x3_t __attribute__ ((__always_inline__))
+vld3_p8 (const poly8_t * __a)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x3_t __attribute__ ((__always_inline__))
+vld3_p16 (const poly16_t * __a)
+{
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
+vld3_s64 (const int64_t * __a)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x3_t __attribute__ ((__always_inline__))
+vld3_u64 (const uint64_t * __a)
+{
+  union { uint64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x16x3_t __attribute__ ((__always_inline__))
+vld3q_s8 (const int8_t * __a)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x3_t __attribute__ ((__always_inline__))
+vld3q_s16 (const int16_t * __a)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x3_t __attribute__ ((__always_inline__))
+vld3q_s32 (const int32_t * __a)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
+vld3q_f32 (const float32_t * __a)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x16x3_t __attribute__ ((__always_inline__))
+vld3q_u8 (const uint8_t * __a)
+{
+  union { uint8x16x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x3_t __attribute__ ((__always_inline__))
+vld3q_u16 (const uint16_t * __a)
+{
+  union { uint16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x3_t __attribute__ ((__always_inline__))
+vld3q_u32 (const uint32_t * __a)
+{
+  union { uint32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x16x3_t __attribute__ ((__always_inline__))
+vld3q_p8 (const poly8_t * __a)
+{
+  union { poly8x16x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x3_t __attribute__ ((__always_inline__))
+vld3q_p16 (const poly16_t * __a)
+{
+  union { poly16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x3_t __attribute__ ((__always_inline__))
+vld3_lane_s8 (const int8_t * __a, int8x8x3_t __b, const int __c)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x3_t __attribute__ ((__always_inline__))
+vld3_lane_s16 (const int16_t * __a, int16x4x3_t __b, const int __c)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x3_t __attribute__ ((__always_inline__))
+vld3_lane_s32 (const int32_t * __a, int32x2x3_t __b, const int __c)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
+vld3_lane_f32 (const float32_t * __a, float32x2x3_t __b, const int __c)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev2sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x3_t __attribute__ ((__always_inline__))
+vld3_lane_u8 (const uint8_t * __a, uint8x8x3_t __b, const int __c)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x3_t __attribute__ ((__always_inline__))
+vld3_lane_u16 (const uint16_t * __a, uint16x4x3_t __b, const int __c)
+{
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x3_t __attribute__ ((__always_inline__))
+vld3_lane_u32 (const uint32_t * __a, uint32x2x3_t __b, const int __c)
+{
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x3_t __attribute__ ((__always_inline__))
+vld3_lane_p8 (const poly8_t * __a, poly8x8x3_t __b, const int __c)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x3_t __attribute__ ((__always_inline__))
+vld3_lane_p16 (const poly16_t * __a, poly16x4x3_t __b, const int __c)
+{
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x3_t __attribute__ ((__always_inline__))
+vld3q_lane_s16 (const int16_t * __a, int16x8x3_t __b, const int __c)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x3_t __attribute__ ((__always_inline__))
+vld3q_lane_s32 (const int32_t * __a, int32x4x3_t __b, const int __c)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
+vld3q_lane_f32 (const float32_t * __a, float32x4x3_t __b, const int __c)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x3_t __attribute__ ((__always_inline__))
+vld3q_lane_u16 (const uint16_t * __a, uint16x8x3_t __b, const int __c)
+{
+  union { uint16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { uint16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x3_t __attribute__ ((__always_inline__))
+vld3q_lane_u32 (const uint32_t * __a, uint32x4x3_t __b, const int __c)
+{
+  union { uint32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { uint32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x3_t __attribute__ ((__always_inline__))
+vld3q_lane_p16 (const poly16_t * __a, poly16x8x3_t __b, const int __c)
+{
+  union { poly16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  union { poly16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x3_t __attribute__ ((__always_inline__))
+vld3_dup_s8 (const int8_t * __a)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x3_t __attribute__ ((__always_inline__))
+vld3_dup_s16 (const int16_t * __a)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x3_t __attribute__ ((__always_inline__))
+vld3_dup_s32 (const int32_t * __a)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
+vld3_dup_f32 (const float32_t * __a)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x3_t __attribute__ ((__always_inline__))
+vld3_dup_u8 (const uint8_t * __a)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x3_t __attribute__ ((__always_inline__))
+vld3_dup_u16 (const uint16_t * __a)
+{
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x3_t __attribute__ ((__always_inline__))
+vld3_dup_u32 (const uint32_t * __a)
+{
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x3_t __attribute__ ((__always_inline__))
+vld3_dup_p8 (const poly8_t * __a)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x3_t __attribute__ ((__always_inline__))
+vld3_dup_p16 (const poly16_t * __a)
+{
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
+vld3_dup_s64 (const int64_t * __a)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x3_t __attribute__ ((__always_inline__))
+vld3_dup_u64 (const uint64_t * __a)
+{
+  union { uint64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_s8 (int8_t * __a, int8x8x3_t __b)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_s16 (int16_t * __a, int16x4x3_t __b)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_s32 (int32_t * __a, int32x2x3_t __b)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_f32 (float32_t * __a, float32x2x3_t __b)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v2sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_u8 (uint8_t * __a, uint8x8x3_t __b)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_u16 (uint16_t * __a, uint16x4x3_t __b)
+{
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_u32 (uint32_t * __a, uint32x2x3_t __b)
+{
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_p8 (poly8_t * __a, poly8x8x3_t __b)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_p16 (poly16_t * __a, poly16x4x3_t __b)
+{
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_s64 (int64_t * __a, int64x1x3_t __b)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_u64 (uint64_t * __a, uint64x1x3_t __b)
+{
+  union { uint64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_s8 (int8_t * __a, int8x16x3_t __b)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_s16 (int16_t * __a, int16x8x3_t __b)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_s32 (int32_t * __a, int32x4x3_t __b)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_f32 (float32_t * __a, float32x4x3_t __b)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v4sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_u8 (uint8_t * __a, uint8x16x3_t __b)
+{
+  union { uint8x16x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_u16 (uint16_t * __a, uint16x8x3_t __b)
+{
+  union { uint16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_u32 (uint32_t * __a, uint32x4x3_t __b)
+{
+  union { uint32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_p8 (poly8_t * __a, poly8x16x3_t __b)
+{
+  union { poly8x16x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_p16 (poly16_t * __a, poly16x8x3_t __b)
+{
+  union { poly16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_s8 (int8_t * __a, int8x8x3_t __b, const int __c)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_s16 (int16_t * __a, int16x4x3_t __b, const int __c)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_s32 (int32_t * __a, int32x2x3_t __b, const int __c)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_f32 (float32_t * __a, float32x2x3_t __b, const int __c)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev2sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_u8 (uint8_t * __a, uint8x8x3_t __b, const int __c)
+{
+  union { uint8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_u16 (uint16_t * __a, uint16x4x3_t __b, const int __c)
+{
+  union { uint16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_u32 (uint32_t * __a, uint32x2x3_t __b, const int __c)
+{
+  union { uint32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_p8 (poly8_t * __a, poly8x8x3_t __b, const int __c)
+{
+  union { poly8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_lane_p16 (poly16_t * __a, poly16x4x3_t __b, const int __c)
+{
+  union { poly16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_s16 (int16_t * __a, int16x8x3_t __b, const int __c)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_s32 (int32_t * __a, int32x4x3_t __b, const int __c)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_f32 (float32_t * __a, float32x4x3_t __b, const int __c)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_u16 (uint16_t * __a, uint16x8x3_t __b, const int __c)
+{
+  union { uint16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_u32 (uint32_t * __a, uint32x4x3_t __b, const int __c)
+{
+  union { uint32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3q_lane_p16 (poly16_t * __a, poly16x8x3_t __b, const int __c)
+{
+  union { poly16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst3_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline int8x8x4_t __attribute__ ((__always_inline__))
+vld4_s8 (const int8_t * __a)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x4_t __attribute__ ((__always_inline__))
+vld4_s16 (const int16_t * __a)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x4_t __attribute__ ((__always_inline__))
+vld4_s32 (const int32_t * __a)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
+vld4_f32 (const float32_t * __a)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x4_t __attribute__ ((__always_inline__))
+vld4_u8 (const uint8_t * __a)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x4_t __attribute__ ((__always_inline__))
+vld4_u16 (const uint16_t * __a)
+{
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x4_t __attribute__ ((__always_inline__))
+vld4_u32 (const uint32_t * __a)
+{
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x4_t __attribute__ ((__always_inline__))
+vld4_p8 (const poly8_t * __a)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x4_t __attribute__ ((__always_inline__))
+vld4_p16 (const poly16_t * __a)
+{
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
+vld4_s64 (const int64_t * __a)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x4_t __attribute__ ((__always_inline__))
+vld4_u64 (const uint64_t * __a)
+{
+  union { uint64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x16x4_t __attribute__ ((__always_inline__))
+vld4q_s8 (const int8_t * __a)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x4_t __attribute__ ((__always_inline__))
+vld4q_s16 (const int16_t * __a)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x4_t __attribute__ ((__always_inline__))
+vld4q_s32 (const int32_t * __a)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
+vld4q_f32 (const float32_t * __a)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x16x4_t __attribute__ ((__always_inline__))
+vld4q_u8 (const uint8_t * __a)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x4_t __attribute__ ((__always_inline__))
+vld4q_u16 (const uint16_t * __a)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x4_t __attribute__ ((__always_inline__))
+vld4q_u32 (const uint32_t * __a)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x16x4_t __attribute__ ((__always_inline__))
+vld4q_p8 (const poly8_t * __a)
+{
+  union { poly8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x4_t __attribute__ ((__always_inline__))
+vld4q_p16 (const poly16_t * __a)
+{
+  union { poly16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x4_t __attribute__ ((__always_inline__))
+vld4_lane_s8 (const int8_t * __a, int8x8x4_t __b, const int __c)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x4_t __attribute__ ((__always_inline__))
+vld4_lane_s16 (const int16_t * __a, int16x4x4_t __b, const int __c)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x4_t __attribute__ ((__always_inline__))
+vld4_lane_s32 (const int32_t * __a, int32x2x4_t __b, const int __c)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
+vld4_lane_f32 (const float32_t * __a, float32x2x4_t __b, const int __c)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev2sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x4_t __attribute__ ((__always_inline__))
+vld4_lane_u8 (const uint8_t * __a, uint8x8x4_t __b, const int __c)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x4_t __attribute__ ((__always_inline__))
+vld4_lane_u16 (const uint16_t * __a, uint16x4x4_t __b, const int __c)
+{
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x4_t __attribute__ ((__always_inline__))
+vld4_lane_u32 (const uint32_t * __a, uint32x2x4_t __b, const int __c)
+{
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev2si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x4_t __attribute__ ((__always_inline__))
+vld4_lane_p8 (const poly8_t * __a, poly8x8x4_t __b, const int __c)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8qi ((const __builtin_neon_qi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x4_t __attribute__ ((__always_inline__))
+vld4_lane_p16 (const poly16_t * __a, poly16x4x4_t __b, const int __c)
+{
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x8x4_t __attribute__ ((__always_inline__))
+vld4q_lane_s16 (const int16_t * __a, int16x8x4_t __b, const int __c)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x4x4_t __attribute__ ((__always_inline__))
+vld4q_lane_s32 (const int32_t * __a, int32x4x4_t __b, const int __c)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
+vld4q_lane_f32 (const float32_t * __a, float32x4x4_t __b, const int __c)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4sf (__a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x8x4_t __attribute__ ((__always_inline__))
+vld4q_lane_u16 (const uint16_t * __a, uint16x8x4_t __b, const int __c)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x4x4_t __attribute__ ((__always_inline__))
+vld4q_lane_u32 (const uint32_t * __a, uint32x4x4_t __b, const int __c)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev4si ((const __builtin_neon_si *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x8x4_t __attribute__ ((__always_inline__))
+vld4q_lane_p16 (const poly16_t * __a, poly16x8x4_t __b, const int __c)
+{
+  union { poly16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  union { poly16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_lanev8hi ((const __builtin_neon_hi *) __a, __bu.__o, __c);
+  return __rv.__i;
+}
+
+__extension__ static __inline int8x8x4_t __attribute__ ((__always_inline__))
+vld4_dup_s8 (const int8_t * __a)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int16x4x4_t __attribute__ ((__always_inline__))
+vld4_dup_s16 (const int16_t * __a)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int32x2x4_t __attribute__ ((__always_inline__))
+vld4_dup_s32 (const int32_t * __a)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
+vld4_dup_f32 (const float32_t * __a)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv2sf (__a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint8x8x4_t __attribute__ ((__always_inline__))
+vld4_dup_u8 (const uint8_t * __a)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint16x4x4_t __attribute__ ((__always_inline__))
+vld4_dup_u16 (const uint16_t * __a)
+{
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint32x2x4_t __attribute__ ((__always_inline__))
+vld4_dup_u32 (const uint32_t * __a)
+{
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly8x8x4_t __attribute__ ((__always_inline__))
+vld4_dup_p8 (const poly8_t * __a)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline poly16x4x4_t __attribute__ ((__always_inline__))
+vld4_dup_p16 (const poly16_t * __a)
+{
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupv4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
+vld4_dup_s64 (const int64_t * __a)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline uint64x1x4_t __attribute__ ((__always_inline__))
+vld4_dup_u64 (const uint64_t * __a)
+{
+  union { uint64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_s8 (int8_t * __a, int8x8x4_t __b)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_s16 (int16_t * __a, int16x4x4_t __b)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_s32 (int32_t * __a, int32x2x4_t __b)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_f32 (float32_t * __a, float32x2x4_t __b)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v2sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_u8 (uint8_t * __a, uint8x8x4_t __b)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_u16 (uint16_t * __a, uint16x4x4_t __b)
+{
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_u32 (uint32_t * __a, uint32x2x4_t __b)
+{
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_p8 (poly8_t * __a, poly8x8x4_t __b)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_p16 (poly16_t * __a, poly16x4x4_t __b)
+{
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_s64 (int64_t * __a, int64x1x4_t __b)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_u64 (uint64_t * __a, uint64x1x4_t __b)
+{
+  union { uint64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_s8 (int8_t * __a, int8x16x4_t __b)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_s16 (int16_t * __a, int16x8x4_t __b)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_s32 (int32_t * __a, int32x4x4_t __b)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_f32 (float32_t * __a, float32x4x4_t __b)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v4sf (__a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_u8 (uint8_t * __a, uint8x16x4_t __b)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_u16 (uint16_t * __a, uint16x8x4_t __b)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_u32 (uint32_t * __a, uint32x4x4_t __b)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_p8 (poly8_t * __a, poly8x16x4_t __b)
+{
+  union { poly8x16x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_p16 (poly16_t * __a, poly16x8x4_t __b)
+{
+  union { poly16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_s8 (int8_t * __a, int8x8x4_t __b, const int __c)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_s16 (int16_t * __a, int16x4x4_t __b, const int __c)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_s32 (int32_t * __a, int32x2x4_t __b, const int __c)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_f32 (float32_t * __a, float32x2x4_t __b, const int __c)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev2sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_u8 (uint8_t * __a, uint8x8x4_t __b, const int __c)
+{
+  union { uint8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_u16 (uint16_t * __a, uint16x4x4_t __b, const int __c)
+{
+  union { uint16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_u32 (uint32_t * __a, uint32x2x4_t __b, const int __c)
+{
+  union { uint32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev2si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_p8 (poly8_t * __a, poly8x8x4_t __b, const int __c)
+{
+  union { poly8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8qi ((__builtin_neon_qi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_lane_p16 (poly16_t * __a, poly16x4x4_t __b, const int __c)
+{
+  union { poly16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_s16 (int16_t * __a, int16x8x4_t __b, const int __c)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_s32 (int32_t * __a, int32x4x4_t __b, const int __c)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_f32 (float32_t * __a, float32x4x4_t __b, const int __c)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4sf (__a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_u16 (uint16_t * __a, uint16x8x4_t __b, const int __c)
+{
+  union { uint16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_u32 (uint32_t * __a, uint32x4x4_t __b, const int __c)
+{
+  union { uint32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev4si ((__builtin_neon_si *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4q_lane_p16 (poly16_t * __a, poly16x8x4_t __b, const int __c)
+{
+  union { poly16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst4_lanev8hi ((__builtin_neon_hi *) __a, __bu.__o, __c);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vand_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vandv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vand_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vandv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vand_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vandv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vand_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vanddi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vand_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vandv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vand_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vandv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vand_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vandv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vand_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vanddi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vandq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vandv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vandq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vandv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vandq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vandv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vandq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vandv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vandq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vandv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vandq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vandv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vandq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vandv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vandq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vandv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vorr_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vorrv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vorr_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vorrv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vorr_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vorrv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vorr_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vorrdi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vorr_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vorrv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vorr_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vorrv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vorr_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vorrv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vorr_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vorrdi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vorrq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vorrv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vorrq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vorrv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vorrq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vorrv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vorrq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vorrv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vorrq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vorrv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vorrq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vorrv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vorrq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vorrv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vorrq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vorrv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+veor_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_veorv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+veor_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_veorv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+veor_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_veorv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+veor_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_veordi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+veor_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_veorv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+veor_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_veorv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+veor_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_veorv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+veor_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_veordi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+veorq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_veorv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+veorq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_veorv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+veorq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_veorv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+veorq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_veorv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+veorq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_veorv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+veorq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_veorv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+veorq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_veorv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+veorq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_veorv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vbic_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vbicv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vbic_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vbicv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vbic_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vbicv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vbic_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vbicdi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vbic_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vbicv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vbic_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vbicv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vbic_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vbicv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vbic_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vbicdi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vbicq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vbicv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vbicq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vbicv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vbicq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vbicv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vbicq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vbicv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vbicq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vbicv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vbicq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vbicv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vbicq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vbicv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vbicq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vbicv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vorn_s8 (int8x8_t __a, int8x8_t __b)
+{
+  return (int8x8_t)__builtin_neon_vornv8qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vorn_s16 (int16x4_t __a, int16x4_t __b)
+{
+  return (int16x4_t)__builtin_neon_vornv4hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vorn_s32 (int32x2_t __a, int32x2_t __b)
+{
+  return (int32x2_t)__builtin_neon_vornv2si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vorn_s64 (int64x1_t __a, int64x1_t __b)
+{
+  return (int64x1_t)__builtin_neon_vorndi (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vorn_u8 (uint8x8_t __a, uint8x8_t __b)
+{
+  return (uint8x8_t)__builtin_neon_vornv8qi ((int8x8_t) __a, (int8x8_t) __b, 0);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vorn_u16 (uint16x4_t __a, uint16x4_t __b)
+{
+  return (uint16x4_t)__builtin_neon_vornv4hi ((int16x4_t) __a, (int16x4_t) __b, 0);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vorn_u32 (uint32x2_t __a, uint32x2_t __b)
+{
+  return (uint32x2_t)__builtin_neon_vornv2si ((int32x2_t) __a, (int32x2_t) __b, 0);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vorn_u64 (uint64x1_t __a, uint64x1_t __b)
+{
+  return (uint64x1_t)__builtin_neon_vorndi ((int64x1_t) __a, (int64x1_t) __b, 0);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vornq_s8 (int8x16_t __a, int8x16_t __b)
+{
+  return (int8x16_t)__builtin_neon_vornv16qi (__a, __b, 1);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vornq_s16 (int16x8_t __a, int16x8_t __b)
+{
+  return (int16x8_t)__builtin_neon_vornv8hi (__a, __b, 1);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vornq_s32 (int32x4_t __a, int32x4_t __b)
+{
+  return (int32x4_t)__builtin_neon_vornv4si (__a, __b, 1);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vornq_s64 (int64x2_t __a, int64x2_t __b)
+{
+  return (int64x2_t)__builtin_neon_vornv2di (__a, __b, 1);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vornq_u8 (uint8x16_t __a, uint8x16_t __b)
+{
+  return (uint8x16_t)__builtin_neon_vornv16qi ((int8x16_t) __a, (int8x16_t) __b, 0);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vornq_u16 (uint16x8_t __a, uint16x8_t __b)
+{
+  return (uint16x8_t)__builtin_neon_vornv8hi ((int16x8_t) __a, (int16x8_t) __b, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vornq_u32 (uint32x4_t __a, uint32x4_t __b)
+{
+  return (uint32x4_t)__builtin_neon_vornv4si ((int32x4_t) __a, (int32x4_t) __b, 0);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vornq_u64 (uint64x2_t __a, uint64x2_t __b)
+{
+  return (uint64x2_t)__builtin_neon_vornv2di ((int64x2_t) __a, (int64x2_t) __b, 0);
+}
+
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_s8 (int8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_s16 (int16x4_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_s32 (int32x2_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_s64 (int64x1_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f32 (float32x2_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_u8 (uint8x8_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_u16 (uint16x4_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_u32 (uint32x2_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_u64 (uint64x1_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_p16 (poly16x4_t __a)
+{
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s8 (int8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s16 (int16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s32 (int32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s64 (int64x2_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f32 (float32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u8 (uint8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u16 (uint16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u32 (uint32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u64 (uint64x2_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_p16 (poly16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_s8 (int8x8_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_s16 (int16x4_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv4hi (__a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_s32 (int32x2_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_s64 (int64x1_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f32 (float32x2_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_u8 (uint8x8_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_u16 (uint16x4_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_u32 (uint32x2_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_u64 (uint64x1_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_p8 (poly8x8_t __a)
+{
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s8 (int8x16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s16 (int16x8_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s32 (int32x4_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s64 (int64x2_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f32 (float32x4_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u8 (uint8x16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u16 (uint16x8_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u32 (uint32x4_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u64 (uint64x2_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_p8 (poly8x16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_s8 (int8x8_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_s16 (int16x4_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_s32 (int32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv2si (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_s64 (int64x1_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi (__a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_u8 (uint8x8_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_u16 (uint16x4_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_u32 (uint32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_u64 (uint64x1_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi ((int64x1_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_p8 (poly8x8_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_p16 (poly16x4_t __a)
+{
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s8 (int8x16_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi (__a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s16 (int16x8_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi (__a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s32 (int32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si (__a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s64 (int64x2_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di (__a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u8 (uint8x16_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u16 (uint16x8_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u32 (uint32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u64 (uint64x2_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p8 (poly8x16_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p16 (poly16x8_t __a)
+{
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_s8 (int8x8_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv8qi (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_s16 (int16x4_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv4hi (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_s32 (int32x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv2si (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_f32 (float32x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_u8 (uint8x8_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_u16 (uint16x4_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_u32 (uint32x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_u64 (uint64x1_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_p8 (poly8x8_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_p16 (poly16x4_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s8 (int8x16_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s16 (int16x8_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s32 (int32x4_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_f32 (float32x4_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u8 (uint8x16_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u16 (uint16x8_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u32 (uint32x4_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u64 (uint64x2_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p8 (poly8x16_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p16 (poly16x8_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s8 (int8x8_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi (__a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s16 (int16x4_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi (__a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s32 (int32x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si (__a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s64 (int64x1_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdidi (__a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_f32 (float32x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u8 (uint8x8_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u16 (uint16x4_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u32 (uint32x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_p8 (poly8x8_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_p16 (poly16x4_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s8 (int8x16_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s16 (int16x8_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s32 (int32x4_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s64 (int64x2_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div2di (__a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_f32 (float32x4_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u8 (uint8x16_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u16 (uint16x8_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u32 (uint32x4_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p8 (poly8x16_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p16 (poly16x8_t __a)
+{
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s16 (int16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s32 (int32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s64 (int64x1_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_f32 (float32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u8 (uint8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u16 (uint16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u32 (uint32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u64 (uint64x1_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_p8 (poly8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_p16 (poly16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s16 (int16x8_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s32 (int32x4_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s64 (int64x2_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_f32 (float32x4_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u8 (uint8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u16 (uint16x8_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u32 (uint32x4_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u64 (uint64x2_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p8 (poly8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p16 (poly16x8_t __a)
+{
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s8 (int8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s32 (int32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s64 (int64x1_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_f32 (float32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u8 (uint8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u16 (uint16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u32 (uint32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u64 (uint64x1_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_p8 (poly8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_p16 (poly16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s8 (int8x16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s32 (int32x4_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s64 (int64x2_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_f32 (float32x4_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u8 (uint8x16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u16 (uint16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u32 (uint32x4_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u64 (uint64x2_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p8 (poly8x16_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p16 (poly16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s8 (int8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s16 (int16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s64 (int64x1_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_f32 (float32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u8 (uint8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u16 (uint16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u32 (uint32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u64 (uint64x1_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_p8 (poly8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_p16 (poly16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s8 (int8x16_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s16 (int16x8_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s64 (int64x2_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_f32 (float32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u8 (uint8x16_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u16 (uint16x8_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u32 (uint32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u64 (uint64x2_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_p8 (poly8x16_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_p16 (poly16x8_t __a)
+{
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s8 (int8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s16 (int16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s32 (int32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s64 (int64x1_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_f32 (float32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u16 (uint16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u32 (uint32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u64 (uint64x1_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_p8 (poly8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_p16 (poly16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s8 (int8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s16 (int16x8_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s32 (int32x4_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s64 (int64x2_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_f32 (float32x4_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_u16 (uint16x8_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_u32 (uint32x4_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_u64 (uint64x2_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_p8 (poly8x16_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_p16 (poly16x8_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s8 (int8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s16 (int16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s32 (int32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s64 (int64x1_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_f32 (float32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u8 (uint8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u32 (uint32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u64 (uint64x1_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_p8 (poly8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_p16 (poly16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_s8 (int8x16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_s16 (int16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv8hi (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_s32 (int32x4_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_s64 (int64x2_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_f32 (float32x4_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_u8 (uint8x16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_u32 (uint32x4_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_u64 (uint64x2_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p8 (poly8x16_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p16 (poly16x8_t __a)
+{
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s8 (int8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s16 (int16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s32 (int32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2si (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s64 (int64x1_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_f32 (float32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u8 (uint8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u16 (uint16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u64 (uint64x1_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_p8 (poly8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_p16 (poly16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s8 (int8x16_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s16 (int16x8_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s32 (int32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4si (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s64 (int64x2_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_f32 (float32x4_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u8 (uint8x16_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u16 (uint16x8_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u64 (uint64x2_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p8 (poly8x16_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p16 (poly16x8_t __a)
+{
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+}
+
+#ifdef __cplusplus
+}
+#endif
+#endif
+#endif
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 9a6938d..ae5429c 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -30,10 +30,10 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dv
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv
 
 ;; The following memory constraints have been used:
-;; in ARM/Thumb-2 state: Q, Uv, Uy
+;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Us
 ;; in ARM state: Uq
 
 
@@ -164,6 +164,30 @@
       (match_test "TARGET_32BIT && arm_const_double_inline_cost (op) == 4
 		   && !(optimize_size || arm_ld_sched)")))
 
+(define_constraint "Dn"
+ "@internal
+  In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov
+  immediate instruction."
+ (and (match_code "const_vector")
+      (match_test "TARGET_32BIT
+		   && imm_for_neon_mov_operand (op, GET_MODE (op))")))
+
+(define_constraint "Dl"
+ "@internal
+  In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or
+  vbic instruction."
+ (and (match_code "const_vector")
+      (match_test "TARGET_32BIT
+		   && imm_for_neon_logic_operand (op, GET_MODE (op))")))
+
+(define_constraint "DL"
+ "@internal
+  In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or
+  vand instruction."
+ (and (match_code "const_vector")
+      (match_test "TARGET_32BIT
+		   && imm_for_neon_inv_logic_operand (op, GET_MODE (op))")))
+
 (define_constraint "Dv"
  "@internal
   In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts
@@ -171,6 +195,13 @@
  (and (match_code "const_double")
       (match_test "TARGET_32BIT && vfp3_const_double_rtx (op)")))
 
+(define_memory_constraint "Ut"
+ "@internal
+  In ARM/Thumb-2 state an address valid for loading/storing opaque structure
+  types wider than TImode."
+ (and (match_code "mem")
+      (match_test "TARGET_32BIT && neon_struct_mem_operand (op)")))
+
 (define_memory_constraint "Uv"
  "@internal
   In ARM/Thumb-2 state a valid VFP load/store address."
@@ -183,6 +214,20 @@
  (and (match_code "mem")
       (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, TRUE)")))
 
+(define_memory_constraint "Un"
+ "@internal
+  In ARM/Thumb-2 state a valid address for Neon element and structure
+  load/store instructions."
+ (and (match_code "mem")
+      (match_test "TARGET_32BIT && neon_vector_mem_operand (op, FALSE)")))
+
+(define_memory_constraint "Us"
+ "@internal
+  In ARM/Thumb-2 state a valid address for non-offset loads/stores of
+  quad-word values in four ARM registers."
+ (and (match_code "mem")
+      (match_test "TARGET_32BIT && neon_vector_mem_operand (op, TRUE)")))
+
 (define_memory_constraint "Uq"
  "@internal
   In ARM state an address valid in ldrsb instructions."
diff --git a/gcc/config/arm/iwmmxt.md b/gcc/config/arm/iwmmxt.md
index 10b915d..a7278bf 100644
--- a/gcc/config/arm/iwmmxt.md
+++ b/gcc/config/arm/iwmmxt.md
@@ -20,6 +20,15 @@
 ;; the Free Software Foundation, 51 Franklin Street, Fifth Floor,
 ;; Boston, MA 02110-1301, USA.
 
+;; Integer element sizes implemented by IWMMXT.
+(define_mode_macro VMMX [V2SI V4HI V8QI])
+
+;; Integer element sizes for shifts.
+(define_mode_macro VSHFT [V4HI V2SI DI])
+
+;; Determine element size suffix from vector mode.
+(define_mode_attr MMX_char [(V8QI "b") (V4HI "h") (V2SI "w") (DI "d")])
+
 (define_insn "iwmmxt_iordi3"
   [(set (match_operand:DI         0 "register_operand" "=y,?&r,?&r")
         (ior:DI (match_operand:DI 1 "register_operand" "%y,0,r")
@@ -239,28 +248,12 @@
 
 ;; Vector add/subtract
 
-(define_insn "addv8qi3"
-  [(set (match_operand:V8QI            0 "register_operand" "=y")
-        (plus:V8QI (match_operand:V8QI 1 "register_operand"  "y")
-	           (match_operand:V8QI 2 "register_operand"  "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "waddb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "addv4hi3"
-  [(set (match_operand:V4HI            0 "register_operand" "=y")
-        (plus:V4HI (match_operand:V4HI 1 "register_operand"  "y")
-	           (match_operand:V4HI 2 "register_operand"  "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "waddh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "addv2si3"
-  [(set (match_operand:V2SI            0 "register_operand" "=y")
-        (plus:V2SI (match_operand:V2SI 1 "register_operand"  "y")
-	           (match_operand:V2SI 2 "register_operand"  "y")))]
+(define_insn "*add<mode>3_iwmmxt"
+  [(set (match_operand:VMMX            0 "register_operand" "=y")
+        (plus:VMMX (match_operand:VMMX 1 "register_operand"  "y")
+	           (match_operand:VMMX 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
-  "waddw%?\\t%0, %1, %2"
+  "wadd<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
 (define_insn "ssaddv8qi3"
@@ -311,28 +304,12 @@
   "waddwus%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "subv8qi3"
-  [(set (match_operand:V8QI             0 "register_operand" "=y")
-        (minus:V8QI (match_operand:V8QI 1 "register_operand"  "y")
-		    (match_operand:V8QI 2 "register_operand"  "y")))]
+(define_insn "*sub<mode>3_iwmmxt"
+  [(set (match_operand:VMMX             0 "register_operand" "=y")
+        (minus:VMMX (match_operand:VMMX 1 "register_operand"  "y")
+		    (match_operand:VMMX 2 "register_operand"  "y")))]
   "TARGET_REALLY_IWMMXT"
-  "wsubb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "subv4hi3"
-  [(set (match_operand:V4HI             0 "register_operand" "=y")
-        (minus:V4HI (match_operand:V4HI 1 "register_operand"  "y")
-		    (match_operand:V4HI 2 "register_operand"  "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsubh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "subv2si3"
-  [(set (match_operand:V2SI             0 "register_operand" "=y")
-        (minus:V2SI (match_operand:V2SI 1 "register_operand"  "y")
-		    (match_operand:V2SI 2 "register_operand"  "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsubw%?\\t%0, %1, %2"
+  "wsub<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
 (define_insn "sssubv8qi3"
@@ -383,7 +360,7 @@
   "wsubwus%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "mulv4hi3"
+(define_insn "*mulv4hi3_iwmmxt"
   [(set (match_operand:V4HI            0 "register_operand" "=y")
         (mult:V4HI (match_operand:V4HI 1 "register_operand" "y")
 		   (match_operand:V4HI 2 "register_operand" "y")))]
@@ -734,100 +711,36 @@
 
 ;; Max/min insns
 
-(define_insn "smaxv8qi3"
-  [(set (match_operand:V8QI            0 "register_operand" "=y")
-        (smax:V8QI (match_operand:V8QI 1 "register_operand" "y")
-		   (match_operand:V8QI 2 "register_operand" "y")))]
+(define_insn "*smax<mode>3_iwmmxt"
+  [(set (match_operand:VMMX            0 "register_operand" "=y")
+        (smax:VMMX (match_operand:VMMX 1 "register_operand" "y")
+		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
-  "wmaxsb%?\\t%0, %1, %2"
+  "wmaxs<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "umaxv8qi3"
-  [(set (match_operand:V8QI            0 "register_operand" "=y")
-        (umax:V8QI (match_operand:V8QI 1 "register_operand" "y")
-		   (match_operand:V8QI 2 "register_operand" "y")))]
+(define_insn "*umax<mode>3_iwmmxt"
+  [(set (match_operand:VMMX            0 "register_operand" "=y")
+        (umax:VMMX (match_operand:VMMX 1 "register_operand" "y")
+		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
-  "wmaxub%?\\t%0, %1, %2"
+  "wmaxu<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "smaxv4hi3"
-  [(set (match_operand:V4HI            0 "register_operand" "=y")
-        (smax:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		   (match_operand:V4HI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wmaxsh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "umaxv4hi3"
-  [(set (match_operand:V4HI            0 "register_operand" "=y")
-        (umax:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		   (match_operand:V4HI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wmaxuh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "smaxv2si3"
-  [(set (match_operand:V2SI            0 "register_operand" "=y")
-        (smax:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		   (match_operand:V2SI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wmaxsw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "umaxv2si3"
-  [(set (match_operand:V2SI            0 "register_operand" "=y")
-        (umax:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		   (match_operand:V2SI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wmaxuw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "sminv8qi3"
-  [(set (match_operand:V8QI            0 "register_operand" "=y")
-        (smin:V8QI (match_operand:V8QI 1 "register_operand" "y")
-		   (match_operand:V8QI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wminsb%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "uminv8qi3"
-  [(set (match_operand:V8QI            0 "register_operand" "=y")
-        (umin:V8QI (match_operand:V8QI 1 "register_operand" "y")
-		   (match_operand:V8QI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wminub%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "sminv4hi3"
-  [(set (match_operand:V4HI            0 "register_operand" "=y")
-        (smin:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		   (match_operand:V4HI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wminsh%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "uminv4hi3"
-  [(set (match_operand:V4HI            0 "register_operand" "=y")
-        (umin:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		   (match_operand:V4HI 2 "register_operand" "y")))]
+(define_insn "*smin<mode>3_iwmmxt"
+  [(set (match_operand:VMMX            0 "register_operand" "=y")
+        (smin:VMMX (match_operand:VMMX 1 "register_operand" "y")
+		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
-  "wminuh%?\\t%0, %1, %2"
+  "wmins<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "sminv2si3"
-  [(set (match_operand:V2SI            0 "register_operand" "=y")
-        (smin:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		   (match_operand:V2SI 2 "register_operand" "y")))]
+(define_insn "*umin<mode>3_iwmmxt"
+  [(set (match_operand:VMMX            0 "register_operand" "=y")
+        (umin:VMMX (match_operand:VMMX 1 "register_operand" "y")
+		   (match_operand:VMMX 2 "register_operand" "y")))]
   "TARGET_REALLY_IWMMXT"
-  "wminsw%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "uminv2si3"
-  [(set (match_operand:V2SI            0 "register_operand" "=y")
-        (umin:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		   (match_operand:V2SI 2 "register_operand" "y")))]
-  "TARGET_REALLY_IWMMXT"
-  "wminuw%?\\t%0, %1, %2"
+  "wminu<MMX_char>%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
 ;; Pack/unpack insns.
@@ -1141,76 +1054,28 @@
   "wrordg%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "ashrv4hi3"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (ashiftrt:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrahg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashrv2si3"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (ashiftrt:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrawg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashrdi3_iwmmxt"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(ashiftrt:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsradg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "lshrv4hi3"
-  [(set (match_operand:V4HI                0 "register_operand" "=y")
-        (lshiftrt:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrlhg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "lshrv2si3"
-  [(set (match_operand:V2SI                0 "register_operand" "=y")
-        (lshiftrt:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:SI   2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsrlwg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "lshrdi3_iwmmxt"
-  [(set (match_operand:DI              0 "register_operand" "=y")
-	(lshiftrt:DI (match_operand:DI 1 "register_operand" "y")
-		     (match_operand:SI 2 "register_operand" "z")))]
+(define_insn "ashr<mode>3_iwmmxt"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y")
+        (ashiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
+			(match_operand:SI    2 "register_operand" "z")))]
   "TARGET_REALLY_IWMMXT"
-  "wsrldg%?\\t%0, %1, %2"
+  "wsra<MMX_char>g%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "ashlv4hi3"
-  [(set (match_operand:V4HI              0 "register_operand" "=y")
-        (ashift:V4HI (match_operand:V4HI 1 "register_operand" "y")
-		     (match_operand:SI   2 "register_operand" "z")))]
+(define_insn "lshr<mode>3_iwmmxt"
+  [(set (match_operand:VSHFT                 0 "register_operand" "=y")
+        (lshiftrt:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
+			(match_operand:SI    2 "register_operand" "z")))]
   "TARGET_REALLY_IWMMXT"
-  "wsllhg%?\\t%0, %1, %2"
+  "wsrl<MMX_char>g%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
-(define_insn "ashlv2si3"
-  [(set (match_operand:V2SI              0 "register_operand" "=y")
-        (ashift:V2SI (match_operand:V2SI 1 "register_operand" "y")
-		       (match_operand:SI 2 "register_operand" "z")))]
-  "TARGET_REALLY_IWMMXT"
-  "wsllwg%?\\t%0, %1, %2"
-  [(set_attr "predicable" "yes")])
-
-(define_insn "ashldi3_iwmmxt"
-  [(set (match_operand:DI            0 "register_operand" "=y")
-	(ashift:DI (match_operand:DI 1 "register_operand" "y")
-		   (match_operand:SI 2 "register_operand" "z")))]
+(define_insn "ashl<mode>3_iwmmxt"
+  [(set (match_operand:VSHFT               0 "register_operand" "=y")
+        (ashift:VSHFT (match_operand:VSHFT 1 "register_operand" "y")
+		      (match_operand:SI    2 "register_operand" "z")))]
   "TARGET_REALLY_IWMMXT"
-  "wslldg%?\\t%0, %1, %2"
+  "wsll<MMX_char>g%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")])
 
 (define_insn "rorv4hi3_di"
diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
new file mode 100644
index 0000000..47d404e
--- /dev/null
+++ b/gcc/config/arm/neon-docgen.ml
@@ -0,0 +1,337 @@
+(* ARM NEON documentation generator.
+
+   Copyright (C) 2006 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 2, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the Free
+   Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301, USA.
+
+   This is an O'Caml program.  The O'Caml compiler is available from:
+
+     http://caml.inria.fr/
+
+   Or from your favourite OS's friendly packaging system. Tested with version
+   3.09.2, though other versions will probably work too.
+
+   Compile with:
+     ocamlc -c neon.ml
+     ocamlc -o neon-docgen neon.cmo neon-docgen.ml
+
+   Run with:
+     /path/to/neon-docgen /path/to/gcc/doc/arm-neon-intrinsics.texi
+*)
+
+open Neon
+
+(* The combined "ops" and "reinterp" table.  *)
+let ops_reinterp = reinterp @ ops
+
+(* Helper functions for extracting things from the "ops" table.  *)
+let single_opcode desired_opcode () =
+  List.fold_left (fun got_so_far ->
+                  fun row ->
+                    match row with
+                      (opcode, _, _, _, _, _) ->
+                        if opcode = desired_opcode then row :: got_so_far
+                                                   else got_so_far
+                 ) [] ops_reinterp
+
+let multiple_opcodes desired_opcodes () =
+  List.fold_left (fun got_so_far ->
+                  fun desired_opcode ->
+                    (single_opcode desired_opcode ()) @ got_so_far)
+                 [] desired_opcodes
+
+let ldx_opcode number () =
+  List.fold_left (fun got_so_far ->
+                  fun row ->
+                    match row with
+                      (opcode, _, _, _, _, _) ->
+                        match opcode with
+                          Vldx n | Vldx_lane n | Vldx_dup n when n = number ->
+                            row :: got_so_far
+                          | _ -> got_so_far
+                 ) [] ops_reinterp
+
+let stx_opcode number () =
+  List.fold_left (fun got_so_far ->
+                  fun row ->
+                    match row with
+                      (opcode, _, _, _, _, _) ->
+                        match opcode with
+                          Vstx n | Vstx_lane n when n = number ->
+                            row :: got_so_far
+                          | _ -> got_so_far
+                 ) [] ops_reinterp
+
+let tbl_opcode () =
+  List.fold_left (fun got_so_far ->
+                  fun row ->
+                    match row with
+                      (opcode, _, _, _, _, _) ->
+                        match opcode with
+                          Vtbl _ -> row :: got_so_far
+                          | _ -> got_so_far
+                 ) [] ops_reinterp
+
+let tbx_opcode () =
+  List.fold_left (fun got_so_far ->
+                  fun row ->
+                    match row with
+                      (opcode, _, _, _, _, _) ->
+                        match opcode with
+                          Vtbx _ -> row :: got_so_far
+                          | _ -> got_so_far
+                 ) [] ops_reinterp
+
+(* The groups of intrinsics.  *)
+let intrinsic_groups =
+  [ "Addition", single_opcode Vadd;
+    "Multiplication", single_opcode Vmul;
+    "Multiply-accumulate", single_opcode Vmla;
+    "Multiply-subtract", single_opcode Vmls;
+    "Subtraction", single_opcode Vsub;
+    "Comparison (equal-to)", single_opcode Vceq;
+    "Comparison (greater-than-or-equal-to)", single_opcode Vcge;
+    "Comparison (less-than-or-equal-to)", single_opcode Vcle;
+    "Comparison (greater-than)", single_opcode Vcgt;
+    "Comparison (less-than)", single_opcode Vclt;
+    "Comparison (absolute greater-than-or-equal-to)", single_opcode Vcage;
+    "Comparison (absolute less-than-or-equal-to)", single_opcode Vcale;
+    "Comparison (absolute greater-than)", single_opcode Vcagt;
+    "Comparison (absolute less-than)", single_opcode Vcalt;
+    "Test bits", single_opcode Vtst;
+    "Absolute difference", single_opcode Vabd;
+    "Absolute difference and accumulate", single_opcode Vaba;
+    "Maximum", single_opcode Vmax;
+    "Minimum", single_opcode Vmin;
+    "Pairwise add", single_opcode Vpadd;
+    "Pairwise add, single_opcode widen and accumulate", single_opcode Vpada;
+    "Folding maximum", single_opcode Vpmax;
+    "Folding minimum", single_opcode Vpmin;
+    "Reciprocal step", multiple_opcodes [Vrecps; Vrsqrts];
+    "Vector shift left", single_opcode Vshl;
+    "Vector shift left by constant", single_opcode Vshl_n;
+    "Vector shift right by constant", single_opcode Vshr_n;
+    "Vector shift right by constant and accumulate", single_opcode Vsra_n;
+    "Vector shift right and insert", single_opcode Vsri;
+    "Vector shift left and insert", single_opcode Vsli;
+    "Absolute value", single_opcode Vabs;
+    "Negation", single_opcode Vneg;
+    "Bitwise not", single_opcode Vmvn;
+    "Count leading sign bits", single_opcode Vcls;
+    "Count leading zeros", single_opcode Vclz;
+    "Count number of set bits", single_opcode Vcnt;
+    "Reciprocal estimate", single_opcode Vrecpe;
+    "Reciprocal square-root estimate", single_opcode Vrsqrte;
+    "Get lanes from a vector", single_opcode Vget_lane;
+    "Set lanes in a vector", single_opcode Vset_lane;
+    "Create vector from literal bit pattern", single_opcode Vcreate;
+    "Set all lanes to the same value",
+      multiple_opcodes [Vdup_n; Vmov_n; Vdup_lane];
+    "Combining vectors", single_opcode Vcombine;
+    "Splitting vectors", multiple_opcodes [Vget_high; Vget_low];
+    "Conversions", multiple_opcodes [Vcvt; Vcvt_n];
+    "Move, single_opcode narrowing", single_opcode Vmovn;
+    "Move, single_opcode long", single_opcode Vmovl;
+    "Table lookup", tbl_opcode;
+    "Extended table lookup", tbx_opcode;
+    "Multiply, lane", single_opcode Vmul_lane;
+    "Long multiply, lane", single_opcode Vmull_lane;
+    "Saturating doubling long multiply, lane", single_opcode Vqdmull_lane;
+    "Saturating doubling multiply high, lane", single_opcode Vqdmulh_lane;
+    "Multiply-accumulate, lane", single_opcode Vmla_lane;
+    "Multiply-subtract, lane", single_opcode Vmls_lane;
+    "Vector multiply by scalar", single_opcode Vmul_n;
+    "Vector long multiply by scalar", single_opcode Vmull_n;
+    "Vector saturating doubling long multiply by scalar",
+      single_opcode Vqdmull_n;
+    "Vector saturating doubling multiply high by scalar",
+      single_opcode Vqdmulh_n;
+    "Vector multiply-accumulate by scalar", single_opcode Vmla_n;
+    "Vector multiply-subtract by scalar", single_opcode Vmls_n;
+    "Vector extract", single_opcode Vext;
+    "Reverse elements", multiple_opcodes [Vrev64; Vrev32; Vrev16];
+    "Bit selection", single_opcode Vbsl;
+    "Transpose elements", single_opcode Vtrn;
+    "Zip elements", single_opcode Vzip;
+    "Unzip elements", single_opcode Vuzp;
+    "Element/structure loads, VLD1 variants", ldx_opcode 1;
+    "Element/structure stores, VST1 variants", stx_opcode 1;
+    "Element/structure loads, VLD2 variants", ldx_opcode 2;
+    "Element/structure stores, VST2 variants", stx_opcode 2;
+    "Element/structure loads, VLD3 variants", ldx_opcode 3;
+    "Element/structure stores, VST3 variants", stx_opcode 3;
+    "Element/structure loads, VLD4 variants", ldx_opcode 4;
+    "Element/structure stores, VST4 variants", stx_opcode 4;
+    "Logical operations (AND)", single_opcode Vand;
+    "Logical operations (OR)", single_opcode Vorr;
+    "Logical operations (exclusive OR)", single_opcode Veor;
+    "Logical operations (AND-NOT)", single_opcode Vbic;
+    "Logical operations (OR-NOT)", single_opcode Vorn;
+    "Reinterpret casts", single_opcode Vreinterp ]
+
+(* Given an intrinsic shape, produce a string to document the corresponding
+   operand shapes.  *)
+let rec analyze_shape shape =
+  let rec n_things n thing =
+    match n with
+      0 -> []
+    | n -> thing :: (n_things (n - 1) thing)
+  in
+  let rec analyze_shape_elt reg_no elt =
+    match elt with
+      Dreg -> "@var{d" ^ (string_of_int reg_no) ^ "}"
+    | Qreg -> "@var{q" ^ (string_of_int reg_no) ^ "}"
+    | Corereg -> "@var{r" ^ (string_of_int reg_no) ^ "}"
+    | Immed -> "#@var{0}"
+    | VecArray (1, elt) ->
+        let elt_regexp = analyze_shape_elt 0 elt in
+          "@{" ^ elt_regexp ^ "@}"
+    | VecArray (n, elt) ->
+      let rec f m =
+        match m with
+          0 -> []
+        | m -> (analyze_shape_elt (m - 1) elt) :: (f (m - 1))
+      in
+      let ops = List.rev (f n) in
+        "@{" ^ (commas (fun x -> x) ops "") ^ "@}"
+    | (PtrTo elt | CstPtrTo elt) ->
+      "[" ^ (analyze_shape_elt reg_no elt) ^ "]"
+    | Element_of_dreg -> (analyze_shape_elt reg_no Dreg) ^ "[@var{0}]"
+    | Element_of_qreg -> (analyze_shape_elt reg_no Qreg) ^ "[@var{0}]"
+    | All_elements_of_dreg -> (analyze_shape_elt reg_no Dreg) ^ "[]"
+  in
+    match shape with
+      All (n, elt) -> commas (analyze_shape_elt 0) (n_things n elt) ""
+    | Long -> (analyze_shape_elt 0 Qreg) ^ ", " ^ (analyze_shape_elt 0 Dreg) ^
+              ", " ^ (analyze_shape_elt 0 Dreg)
+    | Long_noreg elt -> (analyze_shape_elt 0 elt) ^ ", " ^
+              (analyze_shape_elt 0 elt)
+    | Wide -> (analyze_shape_elt 0 Qreg) ^ ", " ^ (analyze_shape_elt 0 Qreg) ^
+              ", " ^ (analyze_shape_elt 0 Dreg)
+    | Wide_noreg elt -> analyze_shape (Long_noreg elt)
+    | Narrow -> (analyze_shape_elt 0 Dreg) ^ ", " ^ (analyze_shape_elt 0 Qreg) ^
+                ", " ^ (analyze_shape_elt 0 Qreg)
+    | Use_operands elts -> commas (analyze_shape_elt 0) (Array.to_list elts) ""
+    | By_scalar Dreg ->
+        analyze_shape (Use_operands [| Dreg; Dreg; Element_of_dreg |])
+    | By_scalar Qreg ->
+        analyze_shape (Use_operands [| Qreg; Qreg; Element_of_dreg |])
+    | By_scalar _ -> assert false
+    | Wide_lane ->
+        analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+    | Wide_scalar ->
+        analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+    | Pair_result elt ->
+      let elt_regexp = analyze_shape_elt 0 elt in
+      let elt_regexp' = analyze_shape_elt 1 elt in
+        elt_regexp ^ ", " ^ elt_regexp'
+    | Unary_scalar _ -> "FIXME Unary_scalar"
+    | Binary_imm elt -> analyze_shape (Use_operands [| elt; elt; Immed |])
+    | Narrow_imm -> analyze_shape (Use_operands [| Dreg; Qreg; Immed |])
+    | Long_imm -> analyze_shape (Use_operands [| Qreg; Dreg; Immed |])
+
+(* Document a single intrinsic.  *)
+let describe_intrinsic first chan
+                       (elt_ty, (_, features, shape, name, munge, _)) =
+  let c_arity, new_elt_ty = munge shape elt_ty in
+  let c_types = strings_of_arity c_arity in
+  Printf.fprintf chan "@itemize @bullet\n";
+  let item_code = if first then "@item" else "@itemx" in
+    Printf.fprintf chan "%s %s %s_%s (" item_code (List.hd c_types)
+                   (intrinsic_name name) (string_of_elt elt_ty);
+    Printf.fprintf chan "%s)\n" (commas (fun ty -> ty) (List.tl c_types) "");
+    if not (List.exists (fun feature -> feature = No_op) features) then
+    begin
+      let print_one_insn name =
+        Printf.fprintf chan "@code{";
+        let no_suffix = (new_elt_ty = NoElts) in
+        let name_with_suffix =
+          if no_suffix then name
+          else name ^ "." ^ (string_of_elt_dots new_elt_ty)
+        in
+        let possible_operands = analyze_all_shapes features shape
+                                                   analyze_shape
+        in
+	let rec print_one_possible_operand op =
+	  Printf.fprintf chan "%s %s}" name_with_suffix op
+        in
+          (* If the intrinsic expands to multiple instructions, we assume
+             they are all of the same form.  *)
+          print_one_possible_operand (List.hd possible_operands)
+      in
+      let rec print_insns names =
+        match names with
+          [] -> ()
+        | [name] -> print_one_insn name
+        | name::names -> (print_one_insn name;
+                          Printf.fprintf chan " @emph{or} ";
+                          print_insns names)
+      in
+      let insn_names = get_insn_names features name in
+        Printf.fprintf chan "@*@emph{Form of expected instruction(s):} ";
+        print_insns insn_names;
+        Printf.fprintf chan "\n"
+    end;
+    Printf.fprintf chan "@end itemize\n";
+    Printf.fprintf chan "\n\n"
+
+(* Document a group of intrinsics.  *)
+let document_group chan (group_title, group_extractor) =
+  (* Extract the rows in question from the ops table and then turn them
+     into a list of intrinsics.  *)
+  let intrinsics =
+    List.fold_left (fun got_so_far ->
+                    fun row ->
+                      match row with
+                        (_, _, _, _, _, elt_tys) ->
+                          List.fold_left (fun got_so_far' ->
+                                          fun elt_ty ->
+                                            (elt_ty, row) :: got_so_far')
+                                         got_so_far elt_tys
+                   ) [] (group_extractor ())
+  in
+    (* Emit the title for this group.  *)
+    Printf.fprintf chan "@subsubsection %s\n\n" group_title;
+    (* Emit a description of each intrinsic.  *)
+    List.iter (describe_intrinsic true chan) intrinsics;
+    (* Close this group.  *)
+    Printf.fprintf chan "\n\n"
+
+let gnu_header chan =
+  List.iter (fun s -> Printf.fprintf chan "%s\n" s) [
+  "@c Copyright (C) 2006 Free Software Foundation, Inc.";
+  "@c This is part of the GCC manual.";
+  "@c For copying conditions, see the file gcc.texi.";
+  "";
+  "@c This file is generated automatically using gcc/config/arm/neon-docgen.ml";
+  "@c Please do not edit manually."]
+
+(* Program entry point.  *)
+let _ =
+  if Array.length Sys.argv <> 2 then
+    failwith "Usage: neon-docgen <output filename>"
+  else
+  let file = Sys.argv.(1) in
+    try
+      let chan = open_out file in
+        gnu_header chan;
+        List.iter (document_group chan) intrinsic_groups;
+        close_out chan
+    with Sys_error sys ->
+      failwith ("Could not create output file " ^ file ^ ": " ^ sys)
diff --git a/gcc/config/arm/neon-gen.ml b/gcc/config/arm/neon-gen.ml
new file mode 100644
index 0000000..1f26fcb
--- /dev/null
+++ b/gcc/config/arm/neon-gen.ml
@@ -0,0 +1,419 @@
+(* Auto-generate ARM Neon intrinsics header file.
+   Copyright (C) 2006, 2007 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 2, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the Free
+   Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301, USA.
+
+   This is an O'Caml program.  The O'Caml compiler is available from:
+
+     http://caml.inria.fr/
+
+   Or from your favourite OS's friendly packaging system. Tested with version
+   3.09.2, though other versions will probably work too.
+
+   Compile with:
+     ocamlc -c neon.ml
+     ocamlc -o neon-gen neon.cmo neon-gen.ml
+
+   Run with:
+     ./neon-gen > arm_neon.h
+*)
+
+open Neon
+
+(* The format codes used in the following functions are documented at:
+     http://caml.inria.fr/pub/docs/manual-ocaml/libref/Format.html\
+     #6_printflikefunctionsforprettyprinting
+   (one line, remove the backslash.)
+*)
+
+(* Following functions can be used to approximate GNU indentation style.  *)
+let start_function () =
+  Format.printf "@[<v 0>";
+  ref 0
+
+let end_function nesting =
+  match !nesting with
+    0 -> Format.printf "@;@;@]"
+  | _ -> failwith ("Bad nesting (ending function at level "
+                   ^ (string_of_int !nesting) ^ ")")
+
+let open_braceblock nesting =
+  begin match !nesting with
+    0 -> Format.printf "@,@<0>{@[<v 2>@,"
+  | _ -> Format.printf "@,@[<v 2>  @<0>{@[<v 2>@,"
+  end;
+  incr nesting
+
+let close_braceblock nesting =
+  decr nesting;
+  match !nesting with
+    0 -> Format.printf "@]@,@<0>}"
+  | _ -> Format.printf "@]@,@<0>}@]"
+
+let print_function arity fnname body =
+  let ffmt = start_function () in
+  Format.printf "__extension__ static __inline ";
+  let inl = "__attribute__ ((__always_inline__))" in
+  begin match arity with
+    Arity0 ret ->
+      Format.printf "%s %s@,%s (void)" (string_of_vectype ret) inl fnname
+  | Arity1 (ret, arg0) ->
+      Format.printf "%s %s@,%s (%s __a)" (string_of_vectype ret) inl fnname
+                                        (string_of_vectype arg0)
+  | Arity2 (ret, arg0, arg1) ->
+      Format.printf "%s %s@,%s (%s __a, %s __b)"
+        (string_of_vectype ret) inl fnname (string_of_vectype arg0)
+	(string_of_vectype arg1)
+  | Arity3 (ret, arg0, arg1, arg2) ->
+      Format.printf "%s %s@,%s (%s __a, %s __b, %s __c)"
+        (string_of_vectype ret) inl fnname (string_of_vectype arg0)
+	(string_of_vectype arg1) (string_of_vectype arg2)
+  | Arity4 (ret, arg0, arg1, arg2, arg3) ->
+      Format.printf "%s %s@,%s (%s __a, %s __b, %s __c, %s __d)"
+        (string_of_vectype ret) inl fnname (string_of_vectype arg0)
+	(string_of_vectype arg1) (string_of_vectype arg2)
+        (string_of_vectype arg3)
+  end;
+  open_braceblock ffmt;
+  let rec print_lines = function
+    [] -> ()
+  | [line] -> Format.printf "%s" line
+  | line::lines -> Format.printf "%s@," line; print_lines lines in
+  print_lines body;
+  close_braceblock ffmt;
+  end_function ffmt
+
+let return_by_ptr features = List.mem ReturnPtr features
+
+let union_string num elts base =
+  let itype = inttype_for_array num elts in
+  let iname = string_of_inttype itype
+  and sname = string_of_vectype (T_arrayof (num, elts)) in
+  Printf.sprintf "union { %s __i; %s __o; } %s" sname iname base
+
+let rec signed_ctype = function
+    T_uint8x8 | T_poly8x8 -> T_int8x8
+  | T_uint8x16 | T_poly8x16 -> T_int8x16
+  | T_uint16x4 | T_poly16x4 -> T_int16x4
+  | T_uint16x8 | T_poly16x8 -> T_int16x8
+  | T_uint32x2 -> T_int32x2
+  | T_uint32x4 -> T_int32x4
+  | T_uint64x1 -> T_int64x1
+  | T_uint64x2 -> T_int64x2
+  (* Cast to types defined by mode in arm.c, not random types pulled in from
+     the <stdint.h> header in use. This fixes incompatible pointer errors when
+     compiling with C++.  *)
+  | T_uint8 | T_int8 -> T_intQI
+  | T_uint16 | T_int16 -> T_intHI
+  | T_uint32 | T_int32 -> T_intSI
+  | T_uint64 | T_int64 -> T_intDI
+  | T_poly8 -> T_intQI
+  | T_poly16 -> T_intHI
+  | T_arrayof (n, elt) -> T_arrayof (n, signed_ctype elt)
+  | T_ptrto elt -> T_ptrto (signed_ctype elt)
+  | T_const elt -> T_const (signed_ctype elt)
+  | x -> x
+
+let add_cast ctype cval =
+  let stype = signed_ctype ctype in
+  if ctype <> stype then
+    Printf.sprintf "(%s) %s" (string_of_vectype stype) cval
+  else
+    cval
+
+let cast_for_return to_ty = "(" ^ (string_of_vectype to_ty) ^ ")"
+
+(* Return a tuple of a list of declarations to go at the start of the function,
+   and a list of statements needed to return THING.  *)
+let return arity return_by_ptr thing =
+  match arity with
+    Arity0 (ret) | Arity1 (ret, _) | Arity2 (ret, _, _) | Arity3 (ret, _, _, _)
+  | Arity4 (ret, _, _, _, _) ->
+    match ret with
+      T_arrayof (num, vec) ->
+        if return_by_ptr then
+          let sname = string_of_vectype ret in
+          [Printf.sprintf "%s __rv;" sname],
+          [thing ^ ";"; "return __rv;"]
+        else
+          let uname = union_string num vec "__rv" in
+          [uname ^ ";"], ["__rv.__o = " ^ thing ^ ";"; "return __rv.__i;"]
+    | T_void -> [], [thing ^ ";"]
+    | _ ->
+        [], ["return " ^ (cast_for_return ret) ^ thing ^ ";"]
+
+let rec element_type ctype =
+  match ctype with
+    T_arrayof (_, v) -> element_type v
+  | _ -> ctype
+
+let params return_by_ptr ps =
+  let pdecls = ref [] in
+  let ptype t p =
+    match t with
+      T_arrayof (num, elts) ->
+        let uname = union_string num elts (p ^ "u") in
+        let decl = Printf.sprintf "%s = { %s };" uname p in
+        pdecls := decl :: !pdecls;
+        p ^ "u.__o"
+    | _ -> add_cast t p in
+  let plist = match ps with
+    Arity0 _ -> []
+  | Arity1 (_, t1) -> [ptype t1 "__a"]
+  | Arity2 (_, t1, t2) -> [ptype t1 "__a"; ptype t2 "__b"]
+  | Arity3 (_, t1, t2, t3) -> [ptype t1 "__a"; ptype t2 "__b"; ptype t3 "__c"]
+  | Arity4 (_, t1, t2, t3, t4) ->
+      [ptype t1 "__a"; ptype t2 "__b"; ptype t3 "__c"; ptype t4 "__d"] in
+  match ps with
+    Arity0 ret | Arity1 (ret, _) | Arity2 (ret, _, _) | Arity3 (ret, _, _, _)
+  | Arity4 (ret, _, _, _, _) ->
+      if return_by_ptr then
+        !pdecls, add_cast (T_ptrto (element_type ret)) "&__rv.val[0]" :: plist
+      else
+        !pdecls, plist
+
+let modify_params features plist =
+  let is_flipped =
+    List.exists (function Flipped _ -> true | _ -> false) features in
+  if is_flipped then
+    match plist with
+      [ a; b ] -> [ b; a ]
+    | _ ->
+      failwith ("Don't know how to flip args " ^ (String.concat ", " plist))
+  else
+    plist
+
+(* !!! Decide whether to add an extra information word based on the shape
+   form.  *)
+let extra_word shape features paramlist bits =
+  let use_word =
+    match shape with
+      All _ | Long | Long_noreg _ | Wide | Wide_noreg _ | Narrow
+    | By_scalar _ | Wide_scalar | Wide_lane | Binary_imm _ | Long_imm
+    | Narrow_imm -> true
+    | _ -> List.mem InfoWord features
+  in
+    if use_word then
+      paramlist @ [string_of_int bits]
+    else
+      paramlist
+
+(* Bit 0 represents signed (1) vs unsigned (0), or float (1) vs poly (0).
+   Bit 1 represents floats & polynomials (1), or ordinary integers (0).
+   Bit 2 represents rounding (1) vs none (0).  *)
+let infoword_value elttype features =
+  let bits01 =
+    match elt_class elttype with
+      Signed | ConvClass (Signed, _) | ConvClass (_, Signed) -> 0b001
+    | Poly -> 0b010
+    | Float -> 0b011
+    | _ -> 0b000
+  and rounding_bit = if List.mem Rounding features then 0b100 else 0b000 in
+  bits01 lor rounding_bit
+
+(* "Cast" type operations will throw an exception in mode_of_elt (actually in
+   elt_width, called from there). Deal with that here, and generate a suffix
+   with multiple modes (<to><from>).  *)
+let rec mode_suffix elttype shape =
+  try
+    let mode = mode_of_elt elttype shape in
+    string_of_mode mode
+  with MixedMode (dst, src) ->
+    let dstmode = mode_of_elt dst shape
+    and srcmode = mode_of_elt src shape in
+    string_of_mode dstmode ^ string_of_mode srcmode
+
+let print_variant opcode features shape name (ctype, asmtype, elttype) =
+  let bits = infoword_value elttype features in
+  let modesuf = mode_suffix elttype shape in
+  let return_by_ptr = return_by_ptr features in
+  let pdecls, paramlist = params return_by_ptr ctype in
+  let paramlist' = modify_params features paramlist in
+  let paramlist'' = extra_word shape features paramlist' bits in
+  let parstr = String.concat ", " paramlist'' in
+  let builtin = Printf.sprintf "__builtin_neon_%s%s (%s)"
+                  (builtin_name features name) modesuf parstr in
+  let rdecls, stmts = return ctype return_by_ptr builtin in
+  let body = pdecls @ rdecls @ stmts
+  and fnname = (intrinsic_name name) ^ "_" ^ (string_of_elt elttype) in
+  print_function ctype fnname body
+
+(* When this function processes the element types in the ops table, it rewrites
+   them in a list of tuples (a,b,c):
+     a : C type as an "arity", e.g. Arity1 (T_poly8x8, T_poly8x8)
+     b : Asm type : a single, processed element type, e.g. P16. This is the
+         type which should be attached to the asm opcode.
+     c : Variant type : the unprocessed type for this variant (e.g. in add
+         instructions which don't care about the sign, b might be i16 and c
+         might be s16.)
+*)
+
+let print_op (opcode, features, shape, name, munge, types) =
+  let sorted_types = List.sort compare types in
+  let munged_types = List.map
+    (fun elt -> let c, asm = munge shape elt in c, asm, elt) sorted_types in
+  List.iter
+    (fun variant -> print_variant opcode features shape name variant)
+    munged_types
+
+let print_ops ops =
+  List.iter print_op ops
+
+(* Output type definitions. Table entries are:
+     cbase : "C" name for the type.
+     abase : "ARM" base name for the type (i.e. int in int8x8_t).
+     esize : element size.
+     enum : element count.
+*)
+
+let deftypes () =
+  let typeinfo = [
+    (* Doubleword vector types.  *)
+    "__builtin_neon_qi", "int", 8, 8;
+    "__builtin_neon_hi", "int", 16, 4;
+    "__builtin_neon_si", "int", 32, 2;
+    "__builtin_neon_di", "int", 64, 1;
+    "__builtin_neon_sf", "float", 32, 2;
+    "__builtin_neon_poly8", "poly", 8, 8;
+    "__builtin_neon_poly16", "poly", 16, 4;
+    "__builtin_neon_uqi", "uint", 8, 8;
+    "__builtin_neon_uhi", "uint", 16, 4;
+    "__builtin_neon_usi", "uint", 32, 2;
+    "__builtin_neon_udi", "uint", 64, 1;
+
+    (* Quadword vector types.  *)
+    "__builtin_neon_qi", "int", 8, 16;
+    "__builtin_neon_hi", "int", 16, 8;
+    "__builtin_neon_si", "int", 32, 4;
+    "__builtin_neon_di", "int", 64, 2;
+    "__builtin_neon_sf", "float", 32, 4;
+    "__builtin_neon_poly8", "poly", 8, 16;
+    "__builtin_neon_poly16", "poly", 16, 8;
+    "__builtin_neon_uqi", "uint", 8, 16;
+    "__builtin_neon_uhi", "uint", 16, 8;
+    "__builtin_neon_usi", "uint", 32, 4;
+    "__builtin_neon_udi", "uint", 64, 2
+  ] in
+  List.iter
+    (fun (cbase, abase, esize, enum) ->
+      let attr =
+        match enum with
+          1 -> ""
+        | _ -> Printf.sprintf "\t__attribute__ ((__vector_size__ (%d)))"
+                              (esize * enum / 8) in
+      Format.printf "typedef %s %s%dx%d_t%s;@\n" cbase abase esize enum attr)
+    typeinfo;
+  Format.print_newline ();
+  (* Extra types not in <stdint.h>.  *)
+  Format.printf "typedef __builtin_neon_sf float32_t;\n";
+  Format.printf "typedef __builtin_neon_poly8 poly8_t;\n";
+  Format.printf "typedef __builtin_neon_poly16 poly16_t;\n"
+
+(* Output structs containing arrays, for load & store instructions etc.  *)
+
+let arrtypes () =
+  let typeinfo = [
+    "int", 8;    "int", 16;
+    "int", 32;   "int", 64;
+    "uint", 8;   "uint", 16;
+    "uint", 32;  "uint", 64;
+    "float", 32; "poly", 8;
+    "poly", 16
+  ] in
+  let writestruct elname elsize regsize arrsize =
+    let elnum = regsize / elsize in
+    let structname =
+      Printf.sprintf "%s%dx%dx%d_t" elname elsize elnum arrsize in
+    let sfmt = start_function () in
+    Format.printf "typedef struct %s" structname;
+    open_braceblock sfmt;
+    Format.printf "%s%dx%d_t val[%d];" elname elsize elnum arrsize;
+    close_braceblock sfmt;
+    Format.printf " %s;" structname;
+    end_function sfmt;
+  in
+    for n = 2 to 4 do
+      List.iter
+        (fun (elname, elsize) ->
+          writestruct elname elsize 64 n;
+          writestruct elname elsize 128 n)
+        typeinfo
+    done
+
+let print_lines = List.iter (fun s -> Format.printf "%s@\n" s)
+
+(* Do it.  *)
+
+let _ =
+  print_lines [
+"/* ARM NEON intrinsics include file. This file is generated automatically";
+"   using neon-gen.ml.  Please do not edit manually.";
+"";
+"   Copyright (C) 2006, 2007 Free Software Foundation, Inc.";
+"   Contributed by CodeSourcery.";
+"";
+"   This file is part of GCC.";
+"";
+"   GCC is free software; you can redistribute it and/or modify it";
+"   under the terms of the GNU General Public License as published";
+"   by the Free Software Foundation; either version 2, or (at your";
+"   option) any later version.";
+"";
+"   GCC is distributed in the hope that it will be useful, but WITHOUT";
+"   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY";
+"   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public";
+"   License for more details.";
+"";
+"   You should have received a copy of the GNU General Public License";
+"   along with GCC; see the file COPYING.  If not, write to the";
+"   Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,";
+"   MA 02110-1301, USA.  */";
+"";
+"/* As a special exception, if you include this header file into source";
+"   files compiled by GCC, this header file does not by itself cause";
+"   the resulting executable to be covered by the GNU General Public";
+"   License.  This exception does not however invalidate any other";
+"   reasons why the executable file might be covered by the GNU General";
+"   Public License.  */";
+"";
+"#ifndef _GCC_ARM_NEON_H";
+"#define _GCC_ARM_NEON_H 1";
+"";
+"#ifndef __ARM_NEON__";
+"#error You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use arm_neon.h";
+"#else";
+"";
+"#ifdef __cplusplus";
+"extern \"C\" {";
+"#endif";
+"";
+"#include <stdint.h>";
+""];
+  deftypes ();
+  arrtypes ();
+  Format.print_newline ();
+  print_ops ops;
+  Format.print_newline ();
+  print_ops reinterp;
+  print_lines [
+"#ifdef __cplusplus";
+"}";
+"#endif";
+"#endif";
+"#endif"]
diff --git a/gcc/config/arm/neon-testgen.ml b/gcc/config/arm/neon-testgen.ml
new file mode 100644
index 0000000..56e11d4
--- /dev/null
+++ b/gcc/config/arm/neon-testgen.ml
@@ -0,0 +1,277 @@
+(* Auto-generate ARM Neon intrinsics tests.
+   Copyright (C) 2006 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 2, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the Free
+   Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301, USA.
+
+   This is an O'Caml program.  The O'Caml compiler is available from:
+
+     http://caml.inria.fr/
+
+   Or from your favourite OS's friendly packaging system. Tested with version
+   3.09.2, though other versions will probably work too.
+
+   Compile with:
+     ocamlc -c neon.ml
+     ocamlc -o neon-testgen neon.cmo neon-testgen.ml
+
+   Run with:
+     cd /path/to/gcc/testsuite/gcc.target/arm/neon
+     /path/to/neon-testgen
+*)
+
+open Neon
+
+type c_type_flags = Pointer | Const
+
+(* Open a test source file.  *)
+let open_test_file dir name =
+  try
+    open_out (dir ^ "/" ^ name ^ ".c")
+  with Sys_error str ->
+    failwith ("Could not create test source file " ^ name ^ ": " ^ str)
+
+(* Emit prologue code to a test source file.  *)
+let emit_prologue chan test_name =
+  Printf.fprintf chan "/* Test the `%s' ARM Neon intrinsic.  */\n" test_name;
+  Printf.fprintf chan "/* This file was autogenerated by neon-testgen.  */\n\n";
+  Printf.fprintf chan "/* { dg-do assemble } */\n";
+  Printf.fprintf chan "/* { dg-require-effective-target arm_neon_ok } */\n";
+  Printf.fprintf chan
+                 "/* { dg-options \"-save-temps -O0 -mfpu=neon -mfloat-abi=softfp\" } */\n";
+  Printf.fprintf chan "\n#include \"arm_neon.h\"\n\n";
+  Printf.fprintf chan "void test_%s (void)\n{\n" test_name
+
+(* Emit declarations of local variables that are going to be passed
+   to an intrinsic, together with one to take a returned value if needed.  *)
+let emit_automatics chan c_types =
+  let emit () =
+    ignore (
+      List.fold_left (fun arg_number -> fun (flags, ty) ->
+                        let pointer_bit =
+                          if List.mem Pointer flags then "*" else ""
+                        in
+                          (* Const arguments to builtins are directly
+                             written in as constants.  *)
+                          if not (List.mem Const flags) then
+                            Printf.fprintf chan "  %s %sarg%d_%s;\n"
+                                           ty pointer_bit arg_number ty;
+                        arg_number + 1)
+                     0 (List.tl c_types))
+  in
+    match c_types with
+      (_, return_ty) :: tys ->
+        if return_ty <> "void" then
+          (* The intrinsic returns a value.  *)
+          (Printf.fprintf chan "  %s out_%s;\n" return_ty return_ty;
+           emit ())
+        else
+          (* The intrinsic does not return a value.  *)
+          emit ()
+    | _ -> assert false
+
+(* Emit code to call an intrinsic.  *)
+let emit_call chan const_valuator c_types name elt_ty =
+  (if snd (List.hd c_types) <> "void" then
+     Printf.fprintf chan "  out_%s = " (snd (List.hd c_types))
+   else
+     Printf.fprintf chan "  ");
+  Printf.fprintf chan "%s_%s (" (intrinsic_name name) (string_of_elt elt_ty);
+  let print_arg chan arg_number (flags, ty) =
+    (* If the argument is of const type, then directly write in the
+       constant now.  *)
+    if List.mem Const flags then
+      match const_valuator with
+        None ->
+          if List.mem Pointer flags then
+            Printf.fprintf chan "0"
+          else
+            Printf.fprintf chan "1"
+      | Some f -> Printf.fprintf chan "%s" (string_of_int (f arg_number))
+    else
+      Printf.fprintf chan "arg%d_%s" arg_number ty
+  in
+  let rec print_args arg_number tys =
+    match tys with
+      [] -> ()
+    | [ty] -> print_arg chan arg_number ty
+    | ty::tys ->
+      print_arg chan arg_number ty;
+      Printf.fprintf chan ", ";
+      print_args (arg_number + 1) tys
+  in
+    print_args 0 (List.tl c_types);
+    Printf.fprintf chan ");\n"
+
+(* Emit epilogue code to a test source file.  *)
+let emit_epilogue chan features regexps =
+  let no_op = List.exists (fun feature -> feature = No_op) features in
+    Printf.fprintf chan "}\n\n";
+    (if not no_op then
+       List.iter (fun regexp ->
+                   Printf.fprintf chan
+                     "/* { dg-final { scan-assembler \"%s\" } } */\n" regexp)
+                regexps
+     else
+       ()
+    );
+    Printf.fprintf chan "/* { dg-final { cleanup-saved-temps } } */\n"
+
+(* Check a list of C types to determine which ones are pointers and which
+   ones are const.  *)
+let check_types tys =
+  let tys' =
+    List.map (fun ty ->
+                let len = String.length ty in
+                  if len > 2 && String.get ty (len - 2) = ' '
+                             && String.get ty (len - 1) = '*'
+                  then ([Pointer], String.sub ty 0 (len - 2))
+                  else ([], ty)) tys
+  in
+    List.map (fun (flags, ty) ->
+                if String.length ty > 6 && String.sub ty 0 6 = "const "
+                then (Const :: flags, String.sub ty 6 ((String.length ty) - 6))
+                else (flags, ty)) tys'
+
+(* Given an intrinsic shape, produce a regexp that will match
+   the right-hand sides of instructions generated by an intrinsic of
+   that shape.  *)
+let rec analyze_shape shape =
+  let rec n_things n thing =
+    match n with
+      0 -> []
+    | n -> thing :: (n_things (n - 1) thing)
+  in
+  let rec analyze_shape_elt elt =
+    match elt with
+      Dreg -> "\\[dD\\]\\[0-9\\]+"
+    | Qreg -> "\\[qQ\\]\\[0-9\\]+"
+    | Corereg -> "\\[rR\\]\\[0-9\\]+"
+    | Immed -> "#\\[0-9\\]+"
+    | VecArray (1, elt) ->
+        let elt_regexp = analyze_shape_elt elt in
+          "((\\\\\\{" ^ elt_regexp ^ "\\\\\\})|(" ^ elt_regexp ^ "))"
+    | VecArray (n, elt) ->
+      let elt_regexp = analyze_shape_elt elt in
+      let alt1 = elt_regexp ^ "-" ^ elt_regexp in
+      let alt2 = commas (fun x -> x) (n_things n elt_regexp) "" in
+        "\\\\\\{((" ^ alt1 ^ ")|(" ^ alt2 ^ "))\\\\\\}"
+    | (PtrTo elt | CstPtrTo elt) ->
+      "\\\\\\[" ^ (analyze_shape_elt elt) ^ "\\\\\\]"
+    | Element_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]"
+    | Element_of_qreg -> (analyze_shape_elt Qreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]"
+    | All_elements_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\\\\\]"
+  in
+    match shape with
+      All (n, elt) -> commas analyze_shape_elt (n_things n elt) ""
+    | Long -> (analyze_shape_elt Qreg) ^ ", " ^ (analyze_shape_elt Dreg) ^
+              ", " ^ (analyze_shape_elt Dreg)
+    | Long_noreg elt -> (analyze_shape_elt elt) ^ ", " ^ (analyze_shape_elt elt)
+    | Wide -> (analyze_shape_elt Qreg) ^ ", " ^ (analyze_shape_elt Qreg) ^
+              ", " ^ (analyze_shape_elt Dreg)
+    | Wide_noreg elt -> analyze_shape (Long_noreg elt)
+    | Narrow -> (analyze_shape_elt Dreg) ^ ", " ^ (analyze_shape_elt Qreg) ^
+                ", " ^ (analyze_shape_elt Qreg)
+    | Use_operands elts -> commas analyze_shape_elt (Array.to_list elts) ""
+    | By_scalar Dreg ->
+        analyze_shape (Use_operands [| Dreg; Dreg; Element_of_dreg |])
+    | By_scalar Qreg ->
+        analyze_shape (Use_operands [| Qreg; Qreg; Element_of_dreg |])
+    | By_scalar _ -> assert false
+    | Wide_lane ->
+        analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+    | Wide_scalar ->
+        analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+    | Pair_result elt ->
+      let elt_regexp = analyze_shape_elt elt in
+        elt_regexp ^ ", " ^ elt_regexp
+    | Unary_scalar _ -> "FIXME Unary_scalar"
+    | Binary_imm elt -> analyze_shape (Use_operands [| elt; elt; Immed |])
+    | Narrow_imm -> analyze_shape (Use_operands [| Dreg; Qreg; Immed |])
+    | Long_imm -> analyze_shape (Use_operands [| Qreg; Dreg; Immed |])
+
+(* Generate tests for one intrinsic.  *)
+let test_intrinsic dir opcode features shape name munge elt_ty =
+  (* Open the test source file.  *)
+  let test_name = name ^ (string_of_elt elt_ty) in
+  let chan = open_test_file dir test_name in
+  (* Work out what argument and return types the intrinsic has.  *)
+  let c_arity, new_elt_ty = munge shape elt_ty in
+  let c_types = check_types (strings_of_arity c_arity) in
+  (* Extract any constant valuator (a function specifying what constant
+     values are to be written into the intrinsic call) from the features
+     list.  *)
+  let const_valuator =
+    try
+      match (List.find (fun feature -> match feature with
+                                         Const_valuator _ -> true
+				       | _ -> false) features) with
+        Const_valuator f -> Some f
+      | _ -> assert false
+    with Not_found -> None
+  in
+  (* Work out what instruction name(s) to expect.  *)
+  let insns = get_insn_names features name in
+  let no_suffix = (new_elt_ty = NoElts) in
+  let insns =
+    if no_suffix then insns
+                 else List.map (fun insn ->
+                                  let suffix = string_of_elt_dots new_elt_ty in
+                                    insn ^ "\\." ^ suffix) insns
+  in
+  (* Construct a regexp to match against the expected instruction name(s).  *)
+  let insn_regexp =
+    match insns with
+      [] -> assert false
+    | [insn] -> insn
+    | _ ->
+      let rec calc_regexp insns cur_regexp =
+        match insns with
+          [] -> cur_regexp
+        | [insn] -> cur_regexp ^ "(" ^ insn ^ "))"
+        | insn::insns -> calc_regexp insns (cur_regexp ^ "(" ^ insn ^ ")|")
+      in calc_regexp insns "("
+  in
+  (* Construct regexps to match against the instructions that this
+     intrinsic expands to.  Watch out for any writeback character and
+     comments after the instruction.  *)
+  let regexps = List.map (fun regexp -> insn_regexp ^ "\\[ \t\\]+" ^ regexp ^
+			  "!?\\(\\[ \t\\]+@\\[a-zA-Z0-9 \\]+\\)?\\n")
+                         (analyze_all_shapes features shape analyze_shape)
+  in
+    (* Emit file and function prologues.  *)
+    emit_prologue chan test_name;
+    (* Emit local variable declarations.  *)
+    emit_automatics chan c_types;
+    Printf.fprintf chan "\n";
+    (* Emit the call to the intrinsic.  *)
+    emit_call chan const_valuator c_types name elt_ty;
+    (* Emit the function epilogue and the DejaGNU scan-assembler directives.  *)
+    emit_epilogue chan features regexps;
+    (* Close the test file.  *)
+    close_out chan
+
+(* Generate tests for one element of the "ops" table.  *)
+let test_intrinsic_group dir (opcode, features, shape, name, munge, types) =
+  List.iter (test_intrinsic dir opcode features shape name munge) types
+
+(* Program entry point.  *)
+let _ =
+  let directory = if Array.length Sys.argv <> 1 then Sys.argv.(1) else "." in
+    List.iter (test_intrinsic_group directory) (reinterp @ ops)
+
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
new file mode 100644
index 0000000..48b4e2a
--- /dev/null
+++ b/gcc/config/arm/neon.md
@@ -0,0 +1,3948 @@
+;; ARM NEON coprocessor Machine Description
+;; Copyright (C) 2006 Free Software Foundation, Inc.
+;; Written by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the Free
+;; Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+;; 02110-1301, USA.
+
+;; Constants for unspecs.
+(define_constants
+  [(UNSPEC_ASHIFT_SIGNED	65)
+   (UNSPEC_ASHIFT_UNSIGNED	66)
+   (UNSPEC_VABA			67)
+   (UNSPEC_VABAL		68)
+   (UNSPEC_VABD			69)
+   (UNSPEC_VABDL		70)
+   (UNSPEC_VABS			71)
+   (UNSPEC_VADD			72)
+   (UNSPEC_VADDHN		73)
+   (UNSPEC_VADDL		74)
+   (UNSPEC_VADDW		75)
+   (UNSPEC_VAND			76)
+   (UNSPEC_VBIC			77)
+   (UNSPEC_VBSL			78)
+   (UNSPEC_VCAGE		79)
+   (UNSPEC_VCAGT		80)
+   (UNSPEC_VCEQ			81)
+   (UNSPEC_VCGE			82)
+   (UNSPEC_VCGT			83)
+   (UNSPEC_VCLS			84)
+   (UNSPEC_VCLZ			85)
+   (UNSPEC_VCNT			86)
+   (UNSPEC_VCOMBINE		87)
+   (UNSPEC_VCVT			88)
+   (UNSPEC_VCVT_N		89)
+   (UNSPEC_VDUP_LANE		90)
+   (UNSPEC_VDUP_N		91)
+   (UNSPEC_VEOR			92)
+   (UNSPEC_VEXT			93)
+   (UNSPEC_VGET_HIGH		94)
+   (UNSPEC_VGET_LANE		95)
+   (UNSPEC_VGET_LOW		96)
+   (UNSPEC_VHADD		97)
+   (UNSPEC_VHSUB		98)
+   (UNSPEC_VLD1			99)
+   (UNSPEC_VLD1_DUP		100)
+   (UNSPEC_VLD1_LANE		101)
+   (UNSPEC_VLD2			102)
+   (UNSPEC_VLD2_DUP		103)
+   (UNSPEC_VLD2_LANE		104)
+   (UNSPEC_VLD3			105)
+   (UNSPEC_VLD3A		106)
+   (UNSPEC_VLD3B		107)
+   (UNSPEC_VLD3_DUP		108)
+   (UNSPEC_VLD3_LANE		109)
+   (UNSPEC_VLD4			110)
+   (UNSPEC_VLD4A		111)
+   (UNSPEC_VLD4B		112)
+   (UNSPEC_VLD4_DUP		113)
+   (UNSPEC_VLD4_LANE		114)
+   (UNSPEC_VMAX			115)
+   (UNSPEC_VMIN			116)
+   (UNSPEC_VMLA			117)
+   (UNSPEC_VMLAL		118)
+   (UNSPEC_VMLA_LANE		119)
+   (UNSPEC_VMLAL_LANE		120)
+   (UNSPEC_VMLS			121)
+   (UNSPEC_VMLSL		122)
+   (UNSPEC_VMLS_LANE		123)
+   (UNSPEC_VMLSL_LANE		124)
+   (UNSPEC_VMOVL		125)
+   (UNSPEC_VMOVN		126)
+   (UNSPEC_VMUL			127)
+   (UNSPEC_VMULL		128)
+   (UNSPEC_VMUL_LANE		129)
+   (UNSPEC_VMULL_LANE		130)
+   (UNSPEC_VMUL_N		131)
+   (UNSPEC_VMVN			132)
+   (UNSPEC_VORN			133)
+   (UNSPEC_VORR			134)
+   (UNSPEC_VPADAL		135)
+   (UNSPEC_VPADD		136)
+   (UNSPEC_VPADDL		137)
+   (UNSPEC_VPMAX		138)
+   (UNSPEC_VPMIN		139)
+   (UNSPEC_VPSMAX		140)
+   (UNSPEC_VPSMIN		141)
+   (UNSPEC_VPUMAX		142)
+   (UNSPEC_VPUMIN		143)
+   (UNSPEC_VQABS		144)
+   (UNSPEC_VQADD		145)
+   (UNSPEC_VQDMLAL		146)
+   (UNSPEC_VQDMLAL_LANE		147)
+   (UNSPEC_VQDMLSL		148)
+   (UNSPEC_VQDMLSL_LANE		149)
+   (UNSPEC_VQDMULH		150)
+   (UNSPEC_VQDMULH_LANE		151)
+   (UNSPEC_VQDMULL		152)
+   (UNSPEC_VQDMULL_LANE		153)
+   (UNSPEC_VQMOVN		154)
+   (UNSPEC_VQMOVUN		155)
+   (UNSPEC_VQNEG		156)
+   (UNSPEC_VQSHL		157)
+   (UNSPEC_VQSHL_N		158)
+   (UNSPEC_VQSHLU_N		159)
+   (UNSPEC_VQSHRN_N		160)
+   (UNSPEC_VQSHRUN_N		161)
+   (UNSPEC_VQSUB		162)
+   (UNSPEC_VRECPE		163)
+   (UNSPEC_VRECPS		164)
+   (UNSPEC_VREV16		165)
+   (UNSPEC_VREV32		166)
+   (UNSPEC_VREV64		167)
+   (UNSPEC_VRSQRTE		168)
+   (UNSPEC_VRSQRTS		169)
+   (UNSPEC_VSET_LANE		170)
+   (UNSPEC_VSHL			171)
+   (UNSPEC_VSHLL_N		172)
+   (UNSPEC_VSHL_N		173)
+   (UNSPEC_VSHR_N		174)
+   (UNSPEC_VSHRN_N		175)
+   (UNSPEC_VSLI			176)
+   (UNSPEC_VSRA_N		177)
+   (UNSPEC_VSRI			178)
+   (UNSPEC_VST1			179)
+   (UNSPEC_VST1_LANE		180)
+   (UNSPEC_VST2			181)
+   (UNSPEC_VST2_LANE		182)
+   (UNSPEC_VST3			183)
+   (UNSPEC_VST3A		184)
+   (UNSPEC_VST3B		185)
+   (UNSPEC_VST3_LANE		186)
+   (UNSPEC_VST4			187)
+   (UNSPEC_VST4A		188)
+   (UNSPEC_VST4B		189)
+   (UNSPEC_VST4_LANE		190)
+   (UNSPEC_VSTRUCTDUMMY		191)
+   (UNSPEC_VSUB			192)
+   (UNSPEC_VSUBHN		193)
+   (UNSPEC_VSUBL		194)
+   (UNSPEC_VSUBW		195)
+   (UNSPEC_VTBL			196)
+   (UNSPEC_VTBX			197)
+   (UNSPEC_VTRN1		198)
+   (UNSPEC_VTRN2		199)
+   (UNSPEC_VTST			200)
+   (UNSPEC_VUZP1		201)
+   (UNSPEC_VUZP2		202)
+   (UNSPEC_VZIP1		203)
+   (UNSPEC_VZIP2		204)])
+
+;; Double-width vector modes.
+(define_mode_macro VD [V8QI V4HI V2SI V2SF])
+
+;; Double-width vector modes plus 64-bit elements.
+(define_mode_macro VDX [V8QI V4HI V2SI V2SF DI])
+
+;; Same, without floating-point elements.
+(define_mode_macro VDI [V8QI V4HI V2SI])
+
+;; Quad-width vector modes.
+(define_mode_macro VQ [V16QI V8HI V4SI V4SF])
+
+;; Quad-width vector modes plus 64-bit elements.
+(define_mode_macro VQX [V16QI V8HI V4SI V4SF V2DI])
+
+;; Same, without floating-point elements.
+(define_mode_macro VQI [V16QI V8HI V4SI])
+
+;; Same, with TImode added, for moves.
+(define_mode_macro VQXMOV [V16QI V8HI V4SI V4SF V2DI TI])
+
+;; Opaque structure types wider than TImode.
+(define_mode_macro VSTRUCT [EI OI CI XI])
+
+;; Number of instructions needed to load/store struct elements. FIXME!
+(define_mode_attr V_slen [(EI "2") (OI "2") (CI "3") (XI "4")])
+
+;; Opaque structure types used in table lookups (except vtbl1/vtbx1).
+(define_mode_macro VTAB [TI EI OI])
+
+;; vtbl<n> suffix for above modes.
+(define_mode_attr VTAB_n [(TI "2") (EI "3") (OI "4")])
+
+;; Widenable modes.
+(define_mode_macro VW [V8QI V4HI V2SI])
+
+;; Narrowable modes.
+(define_mode_macro VN [V8HI V4SI V2DI])
+
+;; All supported vector modes (except singleton DImode).
+(define_mode_macro VDQ [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DI])
+
+;; All supported vector modes (except those with 64-bit integer elements).
+(define_mode_macro VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF])
+
+;; Supported integer vector modes (not 64 bit elements).
+(define_mode_macro VDQIW [V8QI V16QI V4HI V8HI V2SI V4SI])
+
+;; Supported integer vector modes (not singleton DI)
+(define_mode_macro VDQI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
+
+;; Vector modes, including 64-bit integer elements.
+(define_mode_macro VDQX [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF DI V2DI])
+
+;; Vector modes including 64-bit integer elements, but no floats.
+(define_mode_macro VDQIX [V8QI V16QI V4HI V8HI V2SI V4SI DI V2DI])
+
+;; Vector modes for float->int conversions.
+(define_mode_macro VCVTF [V2SF V4SF])
+
+;; Vector modes form int->float conversions.
+(define_mode_macro VCVTI [V2SI V4SI])
+
+;; Vector modes for doubleword multiply-accumulate, etc. insns.
+(define_mode_macro VMD [V4HI V2SI V2SF])
+
+;; Vector modes for quadword multiply-accumulate, etc. insns.
+(define_mode_macro VMQ [V8HI V4SI V4SF])
+
+;; Above modes combined.
+(define_mode_macro VMDQ [V4HI V2SI V2SF V8HI V4SI V4SF])
+
+;; As VMD, but integer modes only.
+(define_mode_macro VMDI [V4HI V2SI])
+
+;; As VMQ, but integer modes only.
+(define_mode_macro VMQI [V8HI V4SI])
+
+;; Above modes combined.
+(define_mode_macro VMDQI [V4HI V2SI V8HI V4SI])
+
+;; Modes with 8-bit and 16-bit elements.
+(define_mode_macro VX [V8QI V4HI V16QI V8HI])
+
+;; Modes with 8-bit elements.
+(define_mode_macro VE [V8QI V16QI])
+
+;; Modes with 64-bit elements only.
+(define_mode_macro V64 [DI V2DI])
+
+;; Modes with 32-bit elements only.
+(define_mode_macro V32 [V2SI V2SF V4SI V4SF])
+
+;; (Opposite) mode to convert to/from for above conversions.
+(define_mode_attr V_CVTTO [(V2SI "V2SF") (V2SF "V2SI")
+			   (V4SI "V4SF") (V4SF "V4SI")])
+
+;; Define element mode for each vector mode.
+(define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
+			  (V4HI "HI") (V8HI "HI")
+                          (V2SI "SI") (V4SI "SI")
+                          (V2SF "SF") (V4SF "SF")
+                          (DI "DI")   (V2DI "DI")])
+
+;; Mode of pair of elements for each vector mode, to define transfer
+;; size for structure lane/dup loads and stores.
+(define_mode_attr V_two_elem [(V8QI "HI") (V16QI "HI")
+			      (V4HI "SI") (V8HI "SI")
+                              (V2SI "V2SI") (V4SI "V2SI")
+                              (V2SF "V2SF") (V4SF "V2SF")
+                              (DI "V2DI")   (V2DI "V2DI")])
+
+;; Similar, for three elements.
+;; ??? Should we define extra modes so that sizes of all three-element
+;; accesses can be accurately represented?
+(define_mode_attr V_three_elem [(V8QI "SI")   (V16QI "SI")
+			        (V4HI "V4HI") (V8HI "V4HI")
+                                (V2SI "V4SI") (V4SI "V4SI")
+                                (V2SF "V4SF") (V4SF "V4SF")
+                                (DI "EI")     (V2DI "EI")])
+
+;; Similar, for four elements.
+(define_mode_attr V_four_elem [(V8QI "SI")   (V16QI "SI")
+			       (V4HI "V4HI") (V8HI "V4HI")
+                               (V2SI "V4SI") (V4SI "V4SI")
+                               (V2SF "V4SF") (V4SF "V4SF")
+                               (DI "OI")     (V2DI "OI")])
+
+;; Register width from element mode
+(define_mode_attr V_reg [(V8QI "P") (V16QI "q")
+                         (V4HI "P") (V8HI  "q")
+                         (V2SI "P") (V4SI  "q")
+                         (V2SF "P") (V4SF  "q")
+                         (DI   "P") (V2DI  "q")])
+
+;; Wider modes with the same number of elements.
+(define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
+
+;; Narrower modes with the same number of elements.
+(define_mode_attr V_narrow [(V8HI "V8QI") (V4SI "V4HI") (V2DI "V2SI")])
+
+;; Modes with half the number of equal-sized elements.
+(define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
+			  (V4SI  "V2SI") (V4SF "V2SF")
+                          (V2DI "DI")])
+
+;; Same, but lower-case.
+(define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
+			  (V4SI  "v2si") (V4SF "v2sf")
+                          (V2DI "di")])
+
+;; Modes with twice the number of equal-sized elements.
+(define_mode_attr V_DOUBLE [(V8QI "V16QI") (V4HI "V8HI")
+			    (V2SI "V4SI") (V2SF "V4SF")
+                            (DI "V2DI")])
+
+;; Same, but lower-case.
+(define_mode_attr V_double [(V8QI "v16qi") (V4HI "v8hi")
+			    (V2SI "v4si") (V2SF "v4sf")
+                            (DI "v2di")])
+
+;; Modes with double-width elements.
+(define_mode_attr V_double_width [(V8QI "V4HI") (V16QI "V8HI")
+				  (V4HI "V2SI") (V8HI "V4SI")
+				  (V2SI "DI")   (V4SI "V2DI")])
+
+;; Mode of result of comparison operations (and bit-select operand 1).
+(define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
+			        (V4HI "V4HI") (V8HI  "V8HI")
+                                (V2SI "V2SI") (V4SI  "V4SI")
+                                (V2SF "V2SI") (V4SF  "V4SI")
+                                (DI   "DI")   (V2DI  "V2DI")])
+
+;; Get element type from double-width mode, for operations where we don't care
+;; about signedness.
+(define_mode_attr V_if_elem [(V8QI "i8")  (V16QI "i8")
+			     (V4HI "i16") (V8HI  "i16")
+                             (V2SI "i32") (V4SI  "i32")
+                             (DI   "i64") (V2DI  "i64")
+			     (V2SF "f32") (V4SF  "f32")])
+
+;; Same, but for operations which work on signed values.
+(define_mode_attr V_s_elem [(V8QI "s8")  (V16QI "s8")
+			    (V4HI "s16") (V8HI  "s16")
+                            (V2SI "s32") (V4SI  "s32")
+                            (DI   "s64") (V2DI  "s64")
+			    (V2SF "f32") (V4SF  "f32")])
+
+;; Same, but for operations which work on unsigned values.
+(define_mode_attr V_u_elem [(V8QI "u8")  (V16QI "u8")
+			    (V4HI "u16") (V8HI  "u16")
+                            (V2SI "u32") (V4SI  "u32")
+                            (DI   "u64") (V2DI  "u64")
+                            (V2SF "f32") (V4SF  "f32")])
+
+;; Element types for extraction of unsigned scalars.
+(define_mode_attr V_uf_sclr [(V8QI "u8")  (V16QI "u8")
+			     (V4HI "u16") (V8HI "u16")
+                             (V2SI "32") (V4SI "32")
+                             (V2SF "32") (V4SF "32")])
+
+(define_mode_attr V_sz_elem [(V8QI "8")  (V16QI "8")
+			     (V4HI "16") (V8HI  "16")
+                             (V2SI "32") (V4SI  "32")
+                             (DI   "64") (V2DI  "64")
+			     (V2SF "32") (V4SF  "32")])
+
+;; Element sizes for duplicating ARM registers to all elements of a vector.
+(define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
+
+;; Opaque integer types for results of pair-forming intrinsics (vtrn, etc.)
+(define_mode_attr V_PAIR [(V8QI "TI") (V16QI "OI")
+			  (V4HI "TI") (V8HI  "OI")
+                          (V2SI "TI") (V4SI  "OI")
+                          (V2SF "TI") (V4SF  "OI")
+                          (DI   "TI") (V2DI  "OI")])
+
+;; Same, but lower-case.
+(define_mode_attr V_pair [(V8QI "ti") (V16QI "oi")
+			  (V4HI "ti") (V8HI  "oi")
+                          (V2SI "ti") (V4SI  "oi")
+                          (V2SF "ti") (V4SF  "oi")
+                          (DI   "ti") (V2DI  "oi")])
+
+;; Operations on two halves of a quadword vector.
+(define_code_macro vqh_ops [plus smin smax umin umax])
+
+;; Same, without unsigned variants (for use with *SFmode pattern).
+(define_code_macro vqhs_ops [plus smin smax])
+
+;; Assembler mnemonics for above codes.
+(define_code_attr VQH_mnem [(plus "vadd") (smin "vmin") (smax "vmax")
+			    (umin "vmin") (umax "vmax")])
+
+;; Signs of above, where relevant.
+(define_code_attr VQH_sign [(plus "i") (smin "s") (smax "s") (umin "u")
+			    (umax "u")])
+
+;; Extra suffix on some 64-bit insn names (to avoid collision with standard
+;; names which we don't want to define).
+(define_mode_attr V_suf64 [(V8QI "") (V16QI "")
+			   (V4HI "") (V8HI "")
+                           (V2SI "") (V4SI "")
+                           (V2SF "") (V4SF "")
+                           (DI "_neon") (V2DI "")])
+
+;; Scalars to be presented to scalar multiplication instructions
+;; must satisfy the following constraints.
+;; 1. If the mode specifies 16-bit elements, the scalar must be in D0-D7.
+;; 2. If the mode specifies 32-bit elements, the scalar must be in D0-D15.
+;; This mode attribute is used to obtain the correct register constraints.
+(define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
+                                         (V8HI "x") (V4SI "t") (V4SF "t")])
+
+(define_insn "*neon_mov<mode>"
+  [(set (match_operand:VD 0 "nonimmediate_operand"
+	  "=w,Uv,w, w,  ?r,?w,?r,?r, ?Us")
+	(match_operand:VD 1 "general_operand"
+	  " w,w, Dn,Uvi, w, r, r, Usi,r"))]
+  "TARGET_NEON"
+{
+  if (which_alternative == 2)
+    {
+      int width, is_valid;
+      static char templ[40];
+
+      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+        &operands[1], &width);
+
+      gcc_assert (is_valid != 0);
+
+      if (width == 0)
+        return "vmov.f32\t%P0, %1  @ <mode>";
+      else
+        sprintf (templ, "vmov.i%d\t%%P0, %%1  @ <mode>", width);
+
+      return templ;
+    }
+
+  /* FIXME: If the memory layout is changed in big-endian mode, output_move_vfp
+     below must be changed to output_move_neon (which will use the
+     element/structure loads/stores), and the constraint changed to 'Un' instead
+     of 'Uv'.  */
+
+  switch (which_alternative)
+    {
+    case 0: return "vmov\t%P0, %P1  @ <mode>";
+    case 1: case 3: return output_move_vfp (operands);
+    case 2: gcc_unreachable ();
+    case 4: return "vmov\t%Q0, %R0, %P1  @ <mode>";
+    case 5: return "vmov\t%P0, %Q1, %R1  @ <mode>";
+    default: return output_move_double (operands);
+    }
+}
+  [(set_attr "type" "farith,f_stored,farith,f_loadd,f_2_r,r_2_f,*,load2,store2")
+   (set_attr "length" "4,4,4,4,4,4,8,8,8")
+   (set_attr "pool_range"     "*,*,*,1020,*,*,*,1020,*")
+   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
+
+(define_insn "*neon_mov<mode>"
+  [(set (match_operand:VQXMOV 0 "nonimmediate_operand"
+  	  "=w,Un,w, w,  ?r,?w,?r,?r,  ?Us")
+	(match_operand:VQXMOV 1 "general_operand"
+	  " w,w, Dn,Uni, w, r, r, Usi, r"))]
+  "TARGET_NEON"
+{
+  if (which_alternative == 2)
+    {
+      int width, is_valid;
+      static char templ[40];
+
+      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+        &operands[1], &width);
+
+      gcc_assert (is_valid != 0);
+
+      if (width == 0)
+        return "vmov.f32\t%q0, %1  @ <mode>";
+      else
+        sprintf (templ, "vmov.i%d\t%%q0, %%1  @ <mode>", width);
+
+      return templ;
+    }
+
+  switch (which_alternative)
+    {
+    case 0: return "vmov\t%q0, %q1  @ <mode>";
+    case 1: case 3: return output_move_neon (operands);
+    case 2: gcc_unreachable ();
+    case 4: return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
+    case 5: return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
+    default: return output_move_quad (operands);
+    }
+}
+  [(set_attr "type" "farith,f_stored,farith,f_loadd,f_2_r,r_2_f,*,load2,store2")
+   (set_attr "length" "4,8,4,8,8,8,16,8,16")
+   (set_attr "pool_range" "*,*,*,1020,*,*,*,1020,*")
+   (set_attr "neg_pool_range" "*,*,*,1008,*,*,*,1008,*")])
+
+(define_expand "movti"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "")
+	(match_operand:TI 1 "general_operand" ""))]
+  "TARGET_NEON"
+{
+})
+
+(define_expand "mov<mode>"
+  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "")
+	(match_operand:VSTRUCT 1 "general_operand" ""))]
+  "TARGET_NEON"
+{
+})
+
+(define_insn "*neon_mov<mode>"
+  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"	"=w,Ut,w")
+	(match_operand:VSTRUCT 1 "general_operand"	" w,w, Ut"))]
+  "TARGET_NEON"
+{
+  switch (which_alternative)
+    {
+    case 0: return "#";
+    case 1: case 2: return output_move_neon (operands);
+    default: gcc_unreachable ();
+    }
+}
+  [(set_attr "length" "<V_slen>,<V_slen>,<V_slen>")])
+
+(define_split
+  [(set (match_operand:EI 0 "s_register_operand" "")
+	(match_operand:EI 1 "s_register_operand" ""))]
+  "TARGET_NEON && reload_completed"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))]
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  rtx dest[2], src[2];
+
+  dest[0] = gen_rtx_REG (TImode, rdest);
+  src[0] = gen_rtx_REG (TImode, rsrc);
+  dest[1] = gen_rtx_REG (DImode, rdest + 4);
+  src[1] = gen_rtx_REG (DImode, rsrc + 4);
+
+  neon_disambiguate_copy (operands, dest, src, 2);
+})
+
+(define_split
+  [(set (match_operand:OI 0 "s_register_operand" "")
+	(match_operand:OI 1 "s_register_operand" ""))]
+  "TARGET_NEON && reload_completed"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))]
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  rtx dest[2], src[2];
+
+  dest[0] = gen_rtx_REG (TImode, rdest);
+  src[0] = gen_rtx_REG (TImode, rsrc);
+  dest[1] = gen_rtx_REG (TImode, rdest + 4);
+  src[1] = gen_rtx_REG (TImode, rsrc + 4);
+
+  neon_disambiguate_copy (operands, dest, src, 2);
+})
+
+(define_split
+  [(set (match_operand:CI 0 "s_register_operand" "")
+	(match_operand:CI 1 "s_register_operand" ""))]
+  "TARGET_NEON && reload_completed"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  rtx dest[3], src[3];
+
+  dest[0] = gen_rtx_REG (TImode, rdest);
+  src[0] = gen_rtx_REG (TImode, rsrc);
+  dest[1] = gen_rtx_REG (TImode, rdest + 4);
+  src[1] = gen_rtx_REG (TImode, rsrc + 4);
+  dest[2] = gen_rtx_REG (TImode, rdest + 8);
+  src[2] = gen_rtx_REG (TImode, rsrc + 8);
+
+  neon_disambiguate_copy (operands, dest, src, 3);
+})
+
+(define_split
+  [(set (match_operand:XI 0 "s_register_operand" "")
+	(match_operand:XI 1 "s_register_operand" ""))]
+  "TARGET_NEON && reload_completed"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))
+   (set (match_dup 6) (match_dup 7))]
+{
+  int rdest = REGNO (operands[0]);
+  int rsrc = REGNO (operands[1]);
+  rtx dest[4], src[4];
+
+  dest[0] = gen_rtx_REG (TImode, rdest);
+  src[0] = gen_rtx_REG (TImode, rsrc);
+  dest[1] = gen_rtx_REG (TImode, rdest + 4);
+  src[1] = gen_rtx_REG (TImode, rsrc + 4);
+  dest[2] = gen_rtx_REG (TImode, rdest + 8);
+  src[2] = gen_rtx_REG (TImode, rsrc + 8);
+  dest[3] = gen_rtx_REG (TImode, rdest + 12);
+  src[3] = gen_rtx_REG (TImode, rsrc + 12);
+
+  neon_disambiguate_copy (operands, dest, src, 4);
+})
+
+(define_insn "vec_set<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "+w")
+        (vec_merge:VD
+          (match_operand:VD 3 "s_register_operand" "0")
+          (vec_duplicate:VD
+            (match_operand:<V_elem> 1 "s_register_operand" "r"))
+          (ashift:SI (const_int 1)
+                     (match_operand:SI 2 "immediate_operand" "i"))))]
+  "TARGET_NEON"
+  "vmov%?.<V_uf_sclr>\t%P0[%c2], %1"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "vec_set<mode>"
+  [(set (match_operand:VQ 0 "s_register_operand" "+w")
+        (vec_merge:VQ
+          (match_operand:VQ 3 "s_register_operand" "0")
+          (vec_duplicate:VQ
+            (match_operand:<V_elem> 1 "s_register_operand" "r"))
+          (ashift:SI (const_int 1)
+		     (match_operand:SI 2 "immediate_operand" "i"))))]
+  "TARGET_NEON"
+{
+  int half_elts = GET_MODE_NUNITS (<MODE>mode) / 2;
+  int elt = INTVAL (operands[2]) % half_elts;
+  int hi = (INTVAL (operands[2]) / half_elts) * 2;
+  int regno = REGNO (operands[0]);
+
+  operands[0] = gen_rtx_REG (<V_HALF>mode, regno + hi);
+  operands[2] = GEN_INT (elt);
+
+  return "vmov%?.<V_uf_sclr>\t%P0[%c2], %1";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_insn "vec_setv2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "+w")
+        (vec_merge:V2DI
+          (match_operand:V2DI 3 "s_register_operand" "0")
+          (vec_duplicate:V2DI
+            (match_operand:DI 1 "s_register_operand" "r"))
+          (ashift:SI (const_int 1)
+		     (match_operand:SI 2 "immediate_operand" "i"))))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[0]) + INTVAL (operands[2]);
+
+  operands[0] = gen_rtx_REG (DImode, regno);
+
+  return "vmov%?.64\t%P0, %Q1, %R1";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_insn "vec_extract<mode>"
+  [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+        (vec_select:<V_elem>
+          (match_operand:VD 1 "s_register_operand" "w")
+          (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "TARGET_NEON"
+  "vmov%?.<V_uf_sclr>\t%0, %P1[%c2]"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "vec_extract<mode>"
+  [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+	(vec_select:<V_elem>
+          (match_operand:VQ 1 "s_register_operand" "w")
+          (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "TARGET_NEON"
+{
+  int half_elts = GET_MODE_NUNITS (<MODE>mode) / 2;
+  int elt = INTVAL (operands[2]) % half_elts;
+  int hi = (INTVAL (operands[2]) / half_elts) * 2;
+  int regno = REGNO (operands[1]);
+
+  operands[1] = gen_rtx_REG (<V_HALF>mode, regno + hi);
+  operands[2] = GEN_INT (elt);
+
+  return "vmov%?.<V_uf_sclr>\t%0, %P1[%c2]";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_insn "vec_extractv2di"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+	(vec_select:DI
+          (match_operand:V2DI 1 "s_register_operand" "w")
+          (parallel [(match_operand:SI 2 "immediate_operand" "i")])))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[1]) + INTVAL (operands[2]);
+
+  operands[1] = gen_rtx_REG (DImode, regno);
+
+  return "vmov%?.64\t%Q0, %R0, %P1";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_expand "vec_init<mode>"
+  [(match_operand:VDQ 0 "s_register_operand" "")
+   (match_operand 1 "" "")]
+  "TARGET_NEON"
+{
+  neon_expand_vector_init (operands[0], operands[1]);
+  DONE;
+})
+
+;; Doubleword and quadword arithmetic.
+
+;; NOTE: vadd/vsub and some other instructions also support 64-bit integer
+;; element size, which we could potentially use for "long long" operations. We
+;; don't want to do this at present though, because moving values from the
+;; vector unit to the ARM core is currently slow and 64-bit addition (etc.) is
+;; easy to do with ARM instructions anyway.
+
+(define_insn "*add<mode>3_neon"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+        (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+		  (match_operand:VDQ 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "*sub<mode>3_neon"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+        (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+                   (match_operand:VDQ 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "*mul<mode>3_neon"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+        (mult:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+                  (match_operand:VDQ 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmul.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "ior<mode>3"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
+	(ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
+		 (match_operand:VDQ 2 "neon_logic_op2" "w,Dl")))]
+  "TARGET_NEON"
+{
+  switch (which_alternative)
+    {
+    case 0: return "vorr\t%<V_reg>0, %<V_reg>1, %<V_reg>2";
+    case 1: return neon_output_logic_immediate ("vorr", &operands[2],
+		     <MODE>mode, 0, VALID_NEON_QREG_MODE (<MODE>mode));
+    default: gcc_unreachable ();
+    }
+})
+
+(define_insn "iordi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w,0")
+		    (match_operand:DI 2 "neon_logic_op2" "w,Dl")]
+                    UNSPEC_VORR))]
+  "TARGET_NEON"
+{
+  switch (which_alternative)
+    {
+    case 0: return "vorr\t%P0, %P1, %P2";
+    case 1: return neon_output_logic_immediate ("vorr", &operands[2],
+		     DImode, 0, VALID_NEON_QREG_MODE (DImode));
+    default: gcc_unreachable ();
+    }
+})
+
+;; The concrete forms of the Neon immediate-logic instructions are vbic and
+;; vorr. We support the pseudo-instruction vand instead, because that
+;; corresponds to the canonical form the middle-end expects to use for
+;; immediate bitwise-ANDs.
+
+(define_insn "and<mode>3"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
+	(and:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
+		 (match_operand:VDQ 2 "neon_inv_logic_op2" "w,DL")))]
+  "TARGET_NEON"
+{
+  switch (which_alternative)
+    {
+    case 0: return "vand\t%<V_reg>0, %<V_reg>1, %<V_reg>2";
+    case 1: return neon_output_logic_immediate ("vand", &operands[2],
+    		     <MODE>mode, 1, VALID_NEON_QREG_MODE (<MODE>mode));
+    default: gcc_unreachable ();
+    }
+})
+
+(define_insn "anddi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w,0")
+		    (match_operand:DI 2 "neon_inv_logic_op2" "w,DL")]
+                    UNSPEC_VAND))]
+  "TARGET_NEON"
+{
+  switch (which_alternative)
+    {
+    case 0: return "vand\t%P0, %P1, %P2";
+    case 1: return neon_output_logic_immediate ("vand", &operands[2],
+    		     DImode, 1, VALID_NEON_QREG_MODE (DImode));
+    default: gcc_unreachable ();
+    }
+})
+
+(define_insn "orn<mode>3_neon"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+	(ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+		 (not:VDQ (match_operand:VDQ 2 "s_register_operand" "w"))))]
+  "TARGET_NEON"
+  "vorn\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "orndi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w")
+		    (match_operand:DI 2 "s_register_operand" "w")]
+                    UNSPEC_VORN))]
+  "TARGET_NEON"
+  "vorn\t%P0, %P1, %P2")
+
+(define_insn "bic<mode>3_neon"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+	(and:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+		  (not:VDQ (match_operand:VDQ 2 "s_register_operand" "w"))))]
+  "TARGET_NEON"
+  "vbic\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "bicdi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w")
+		     (match_operand:DI 2 "s_register_operand" "w")]
+                    UNSPEC_VBIC))]
+  "TARGET_NEON"
+  "vbic\t%P0, %P1, %P2")
+
+(define_insn "xor<mode>3"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+	(xor:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
+		 (match_operand:VDQ 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "veor\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "xordi3_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w")
+		     (match_operand:DI 2 "s_register_operand" "w")]
+                    UNSPEC_VEOR))]
+  "TARGET_NEON"
+  "veor\t%P0, %P1, %P2")
+
+(define_insn "one_cmpl<mode>2"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+        (not:VDQ (match_operand:VDQ 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmvn\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(abs:VDQW (match_operand:VDQW 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vabs.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(neg:VDQW (match_operand:VDQW 1 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vneg.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "*umin<mode>3_neon"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		    (match_operand:VDQIW 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmin.<V_u_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "*umax<mode>3_neon"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(umax:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		    (match_operand:VDQIW 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmax.<V_u_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "*smin<mode>3_neon"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(smin:VDQW (match_operand:VDQW 1 "s_register_operand" "w")
+		   (match_operand:VDQW 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmin.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "*smax<mode>3_neon"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(smax:VDQW (match_operand:VDQW 1 "s_register_operand" "w")
+		   (match_operand:VDQW 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vmax.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+; TODO: V2DI shifts are current disabled because there are bugs in the
+; generic vectorizer code.  It ends up creating a V2DI constructor with
+; SImode elements.
+
+(define_insn "ashl<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(ashift:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		      (match_operand:VDQIW 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vshl.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+; Used for implementing logical shift-right, which is a left-shift by a negative
+; amount, with signed operands. This is essentially the same as ashl<mode>3
+; above, but using an unspec in case GCC tries anything tricky with negative
+; shift amounts.
+
+(define_insn "ashl<mode>3_signed"
+  [(set (match_operand:VDQI 0 "s_register_operand" "=w")
+	(unspec:VDQI [(match_operand:VDQI 1 "s_register_operand" "w")
+		      (match_operand:VDQI 2 "s_register_operand" "w")]
+		     UNSPEC_ASHIFT_SIGNED))]
+  "TARGET_NEON"
+  "vshl.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+; Used for implementing logical shift-right, which is a left-shift by a negative
+; amount, with unsigned operands.
+
+(define_insn "ashl<mode>3_unsigned"
+  [(set (match_operand:VDQI 0 "s_register_operand" "=w")
+	(unspec:VDQI [(match_operand:VDQI 1 "s_register_operand" "w")
+		      (match_operand:VDQI 2 "s_register_operand" "w")]
+		     UNSPEC_ASHIFT_UNSIGNED))]
+  "TARGET_NEON"
+  "vshl.<V_u_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_expand "ashr<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "")
+	(ashiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand" "")
+			(match_operand:VDQIW 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+{
+  rtx neg = gen_reg_rtx (<MODE>mode);
+
+  emit_insn (gen_neg<mode>2 (neg, operands[2]));
+  emit_insn (gen_ashl<mode>3_signed (operands[0], operands[1], neg));
+
+  DONE;
+})
+
+(define_expand "lshr<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "")
+	(lshiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand" "")
+			(match_operand:VDQIW 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+{
+  rtx neg = gen_reg_rtx (<MODE>mode);
+
+  emit_insn (gen_neg<mode>2 (neg, operands[2]));
+  emit_insn (gen_ashl<mode>3_unsigned (operands[0], operands[1], neg));
+
+  DONE;
+})
+
+;; Widening operations
+
+(define_insn "widen_ssum<mode>3"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(plus:<V_widen> (sign_extend:<V_widen>
+			  (match_operand:VW 1 "s_register_operand" "%w"))
+		        (match_operand:<V_widen> 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q2, %P1")
+
+(define_insn "widen_usum<mode>3"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(plus:<V_widen> (zero_extend:<V_widen>
+			  (match_operand:VW 1 "s_register_operand" "%w"))
+		        (match_operand:<V_widen> 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vaddw.<V_u_elem>\t%q0, %q2, %P1")
+
+;; VEXT can be used to synthesize coarse whole-vector shifts with 8-bit
+;; shift-count granularity. That's good enough for the middle-end's current
+;; needs.
+
+(define_expand "vec_shr_<mode>"
+  [(match_operand:VDQ 0 "s_register_operand" "")
+   (match_operand:VDQ 1 "s_register_operand" "")
+   (match_operand:SI 2 "const_multiple_of_8_operand" "")]
+  "TARGET_NEON"
+{
+  rtx zero_reg;
+  HOST_WIDE_INT num_bits = INTVAL (operands[2]);
+  const int width = GET_MODE_BITSIZE (<MODE>mode);
+  const enum machine_mode bvecmode = (width == 128) ? V16QImode : V8QImode;
+  rtx (*gen_ext) (rtx, rtx, rtx, rtx) =
+    (width == 128) ? gen_neon_vextv16qi : gen_neon_vextv8qi;
+
+  if (num_bits == width)
+    {
+      emit_move_insn (operands[0], operands[1]);
+      DONE;
+    }
+
+  zero_reg = force_reg (bvecmode, CONST0_RTX (bvecmode));
+  operands[0] = gen_lowpart (bvecmode, operands[0]);
+  operands[1] = gen_lowpart (bvecmode, operands[1]);
+
+  emit_insn (gen_ext (operands[0], operands[1], zero_reg,
+		      GEN_INT (num_bits / BITS_PER_UNIT)));
+  DONE;
+})
+
+(define_expand "vec_shl_<mode>"
+  [(match_operand:VDQ 0 "s_register_operand" "")
+   (match_operand:VDQ 1 "s_register_operand" "")
+   (match_operand:SI 2 "const_multiple_of_8_operand" "")]
+  "TARGET_NEON"
+{
+  rtx zero_reg;
+  HOST_WIDE_INT num_bits = INTVAL (operands[2]);
+  const int width = GET_MODE_BITSIZE (<MODE>mode);
+  const enum machine_mode bvecmode = (width == 128) ? V16QImode : V8QImode;
+  rtx (*gen_ext) (rtx, rtx, rtx, rtx) =
+    (width == 128) ? gen_neon_vextv16qi : gen_neon_vextv8qi;
+
+  if (num_bits == 0)
+    {
+      emit_move_insn (operands[0], CONST0_RTX (<MODE>mode));
+      DONE;
+    }
+
+  num_bits = width - num_bits;
+
+  zero_reg = force_reg (bvecmode, CONST0_RTX (bvecmode));
+  operands[0] = gen_lowpart (bvecmode, operands[0]);
+  operands[1] = gen_lowpart (bvecmode, operands[1]);
+
+  emit_insn (gen_ext (operands[0], zero_reg, operands[1],
+		      GEN_INT (num_bits / BITS_PER_UNIT)));
+  DONE;
+})
+
+;; Helpers for quad-word reduction operations
+
+; Add (or smin, smax...) the low N/2 elements of the N-element vector
+; operand[1] to the high N/2 elements of same. Put the result in operand[0], an
+; N/2-element vector.
+
+(define_insn "quad_halves_<code>v4si"
+  [(set (match_operand:V2SI 0 "s_register_operand" "=w")
+        (vqh_ops:V2SI
+          (vec_select:V2SI (match_operand:V4SI 1 "s_register_operand" "w")
+                           (parallel [(const_int 0) (const_int 1)]))
+          (vec_select:V2SI (match_dup 1)
+                           (parallel [(const_int 2) (const_int 3)]))))]
+  "TARGET_NEON"
+  "<VQH_mnem>.<VQH_sign>32\t%P0, %e1, %f1")
+
+(define_insn "quad_halves_<code>v4sf"
+  [(set (match_operand:V2SF 0 "s_register_operand" "=w")
+        (vqhs_ops:V2SF
+          (vec_select:V2SF (match_operand:V4SF 1 "s_register_operand" "w")
+                           (parallel [(const_int 0) (const_int 1)]))
+          (vec_select:V2SF (match_dup 1)
+                           (parallel [(const_int 2) (const_int 3)]))))]
+  "TARGET_NEON"
+  "<VQH_mnem>.f32\t%P0, %e1, %f1")
+
+(define_insn "quad_halves_<code>v8hi"
+  [(set (match_operand:V4HI 0 "s_register_operand" "+w")
+        (vqh_ops:V4HI
+          (vec_select:V4HI (match_operand:V8HI 1 "s_register_operand" "w")
+                           (parallel [(const_int 0) (const_int 1)
+				      (const_int 2) (const_int 3)]))
+          (vec_select:V4HI (match_dup 1)
+                           (parallel [(const_int 4) (const_int 5)
+				      (const_int 6) (const_int 7)]))))]
+  "TARGET_NEON"
+  "<VQH_mnem>.<VQH_sign>16\t%P0, %e1, %f1")
+
+(define_insn "quad_halves_<code>v16qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "+w")
+        (vqh_ops:V8QI
+          (vec_select:V8QI (match_operand:V16QI 1 "s_register_operand" "w")
+                           (parallel [(const_int 0) (const_int 1)
+				      (const_int 2) (const_int 3)
+				      (const_int 4) (const_int 5)
+				      (const_int 6) (const_int 7)]))
+          (vec_select:V8QI (match_dup 1)
+                           (parallel [(const_int 8) (const_int 9)
+				      (const_int 10) (const_int 11)
+				      (const_int 12) (const_int 13)
+				      (const_int 14) (const_int 15)]))))]
+  "TARGET_NEON"
+  "<VQH_mnem>.<VQH_sign>8\t%P0, %e1, %f1")
+
+; FIXME: We wouldn't need the following insns if we could write subregs of
+; vector registers. Make an attempt at removing unnecessary moves, though
+; we're really at the mercy of the register allocator.
+
+(define_insn "move_lo_quad_v4si"
+  [(set (match_operand:V4SI 0 "s_register_operand" "+w")
+        (vec_concat:V4SI
+          (match_operand:V2SI 1 "s_register_operand" "w")
+          (vec_select:V2SI (match_dup 0)
+			   (parallel [(const_int 2) (const_int 3)]))))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src)
+    return "vmov\t%e0, %P1";
+  else
+    return "";
+})
+
+(define_insn "move_lo_quad_v4sf"
+  [(set (match_operand:V4SF 0 "s_register_operand" "+w")
+        (vec_concat:V4SF
+          (match_operand:V2SF 1 "s_register_operand" "w")
+          (vec_select:V2SF (match_dup 0)
+			   (parallel [(const_int 2) (const_int 3)]))))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src)
+    return "vmov\t%e0, %P1";
+  else
+    return "";
+})
+
+(define_insn "move_lo_quad_v8hi"
+  [(set (match_operand:V8HI 0 "s_register_operand" "+w")
+        (vec_concat:V8HI
+          (match_operand:V4HI 1 "s_register_operand" "w")
+          (vec_select:V4HI (match_dup 0)
+                           (parallel [(const_int 4) (const_int 5)
+                                      (const_int 6) (const_int 7)]))))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src)
+    return "vmov\t%e0, %P1";
+  else
+    return "";
+})
+
+(define_insn "move_lo_quad_v16qi"
+  [(set (match_operand:V16QI 0 "s_register_operand" "+w")
+        (vec_concat:V16QI
+          (match_operand:V8QI 1 "s_register_operand" "w")
+          (vec_select:V8QI (match_dup 0)
+                           (parallel [(const_int 8)  (const_int 9)
+                                      (const_int 10) (const_int 11)
+                                      (const_int 12) (const_int 13)
+                                      (const_int 14) (const_int 15)]))))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src)
+    return "vmov\t%e0, %P1";
+  else
+    return "";
+})
+
+;; Reduction operations
+
+(define_expand "reduc_splus_<mode>"
+  [(match_operand:VD 0 "s_register_operand" "")
+   (match_operand:VD 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
+			&gen_neon_vpadd_internal<mode>);
+  DONE;
+})
+
+(define_expand "reduc_splus_<mode>"
+  [(match_operand:VQ 0 "s_register_operand" "")
+   (match_operand:VQ 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx step1 = gen_reg_rtx (<V_HALF>mode);
+  rtx res_d = gen_reg_rtx (<V_HALF>mode);
+
+  emit_insn (gen_quad_halves_plus<mode> (step1, operands[1]));
+  emit_insn (gen_reduc_splus_<V_half> (res_d, step1));
+  emit_insn (gen_move_lo_quad_<mode> (operands[0], res_d));
+
+  DONE;
+})
+
+(define_insn "reduc_splus_v2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")]
+		     UNSPEC_VPADD))]
+  "TARGET_NEON"
+  "vadd.i64\t%e0, %e1, %f1")
+
+;; NEON does not distinguish between signed and unsigned addition except on
+;; widening operations.
+(define_expand "reduc_uplus_<mode>"
+  [(match_operand:VDQI 0 "s_register_operand" "")
+   (match_operand:VDQI 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_reduc_splus_<mode> (operands[0], operands[1]));
+  DONE;
+})
+
+(define_expand "reduc_smin_<mode>"
+  [(match_operand:VD 0 "s_register_operand" "")
+   (match_operand:VD 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
+			&gen_neon_vpsmin<mode>);
+  DONE;
+})
+
+(define_expand "reduc_smin_<mode>"
+  [(match_operand:VQ 0 "s_register_operand" "")
+   (match_operand:VQ 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx step1 = gen_reg_rtx (<V_HALF>mode);
+  rtx res_d = gen_reg_rtx (<V_HALF>mode);
+
+  emit_insn (gen_quad_halves_smin<mode> (step1, operands[1]));
+  emit_insn (gen_reduc_smin_<V_half> (res_d, step1));
+  emit_insn (gen_move_lo_quad_<mode> (operands[0], res_d));
+
+  DONE;
+})
+
+(define_expand "reduc_smax_<mode>"
+  [(match_operand:VD 0 "s_register_operand" "")
+   (match_operand:VD 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
+			&gen_neon_vpsmax<mode>);
+  DONE;
+})
+
+(define_expand "reduc_smax_<mode>"
+  [(match_operand:VQ 0 "s_register_operand" "")
+   (match_operand:VQ 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx step1 = gen_reg_rtx (<V_HALF>mode);
+  rtx res_d = gen_reg_rtx (<V_HALF>mode);
+
+  emit_insn (gen_quad_halves_smax<mode> (step1, operands[1]));
+  emit_insn (gen_reduc_smax_<V_half> (res_d, step1));
+  emit_insn (gen_move_lo_quad_<mode> (operands[0], res_d));
+
+  DONE;
+})
+
+(define_expand "reduc_umin_<mode>"
+  [(match_operand:VDI 0 "s_register_operand" "")
+   (match_operand:VDI 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
+			&gen_neon_vpumin<mode>);
+  DONE;
+})
+
+(define_expand "reduc_umin_<mode>"
+  [(match_operand:VQI 0 "s_register_operand" "")
+   (match_operand:VQI 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx step1 = gen_reg_rtx (<V_HALF>mode);
+  rtx res_d = gen_reg_rtx (<V_HALF>mode);
+
+  emit_insn (gen_quad_halves_umin<mode> (step1, operands[1]));
+  emit_insn (gen_reduc_umin_<V_half> (res_d, step1));
+  emit_insn (gen_move_lo_quad_<mode> (operands[0], res_d));
+
+  DONE;
+})
+
+(define_expand "reduc_umax_<mode>"
+  [(match_operand:VDI 0 "s_register_operand" "")
+   (match_operand:VDI 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_pairwise_reduce (operands[0], operands[1], <MODE>mode,
+			&gen_neon_vpumax<mode>);
+  DONE;
+})
+
+(define_expand "reduc_umax_<mode>"
+  [(match_operand:VQI 0 "s_register_operand" "")
+   (match_operand:VQI 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  rtx step1 = gen_reg_rtx (<V_HALF>mode);
+  rtx res_d = gen_reg_rtx (<V_HALF>mode);
+
+  emit_insn (gen_quad_halves_umax<mode> (step1, operands[1]));
+  emit_insn (gen_reduc_umax_<V_half> (res_d, step1));
+  emit_insn (gen_move_lo_quad_<mode> (operands[0], res_d));
+
+  DONE;
+})
+
+(define_insn "neon_vpadd_internal<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:VD 2 "s_register_operand" "w")]
+                   UNSPEC_VPADD))]
+  "TARGET_NEON"
+  "vpadd.<V_if_elem>\t%P0, %P1, %P2")
+
+(define_insn "neon_vpsmin<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:VD 2 "s_register_operand" "w")]
+                   UNSPEC_VPSMIN))]
+  "TARGET_NEON"
+  "vpmin.<V_s_elem>\t%P0, %P1, %P2")
+
+(define_insn "neon_vpsmax<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:VD 2 "s_register_operand" "w")]
+                   UNSPEC_VPSMAX))]
+  "TARGET_NEON"
+  "vpmax.<V_s_elem>\t%P0, %P1, %P2")
+
+(define_insn "neon_vpumin<mode>"
+  [(set (match_operand:VDI 0 "s_register_operand" "=w")
+	(unspec:VDI [(match_operand:VDI 1 "s_register_operand" "w")
+		     (match_operand:VDI 2 "s_register_operand" "w")]
+                   UNSPEC_VPUMIN))]
+  "TARGET_NEON"
+  "vpmin.<V_u_elem>\t%P0, %P1, %P2")
+
+(define_insn "neon_vpumax<mode>"
+  [(set (match_operand:VDI 0 "s_register_operand" "=w")
+	(unspec:VDI [(match_operand:VDI 1 "s_register_operand" "w")
+		     (match_operand:VDI 2 "s_register_operand" "w")]
+                   UNSPEC_VPUMAX))]
+  "TARGET_NEON"
+  "vpmax.<V_u_elem>\t%P0, %P1, %P2")
+
+;; Saturating arithmetic
+
+; NOTE: Neon supports many more saturating variants of instructions than the
+; following, but these are all GCC currently understands.
+; FIXME: Actually, GCC doesn't know how to create saturating add/sub by itself
+; yet either, although these patterns may be used by intrinsics when they're
+; added.
+
+(define_insn "*ss_add<mode>_neon"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+       (ss_plus:VD (match_operand:VD 1 "s_register_operand" "w")
+                   (match_operand:VD 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vqadd.<V_s_elem>\t%P0, %P1, %P2")
+
+(define_insn "*us_add<mode>_neon"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+       (us_plus:VD (match_operand:VD 1 "s_register_operand" "w")
+                   (match_operand:VD 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vqadd.<V_u_elem>\t%P0, %P1, %P2")
+
+(define_insn "*ss_sub<mode>_neon"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+       (ss_minus:VD (match_operand:VD 1 "s_register_operand" "w")
+                    (match_operand:VD 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vqsub.<V_s_elem>\t%P0, %P1, %P2")
+
+(define_insn "*us_sub<mode>_neon"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+       (us_minus:VD (match_operand:VD 1 "s_register_operand" "w")
+                    (match_operand:VD 2 "s_register_operand" "w")))]
+  "TARGET_NEON"
+  "vqsub.<V_u_elem>\t%P0, %P1, %P2")
+
+;; Patterns for builtins.
+
+; good for plain vadd, vaddq.
+
+(define_insn "neon_vadd<mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
+        (unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")
+		      (match_operand:VDQX 2 "s_register_operand" "w")
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VADD))]
+  "TARGET_NEON"
+  "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+; operand 3 represents in bits:
+;  bit 0: signed (vs unsigned).
+;  bit 1: rounding (vs none).
+
+(define_insn "neon_vaddl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w")
+		           (match_operand:VDI 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VADDL))]
+  "TARGET_NEON"
+  "vaddl.%T3%#<V_sz_elem>\t%q0, %P1, %P2")
+
+(define_insn "neon_vaddw<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "w")
+		           (match_operand:VDI 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VADDW))]
+  "TARGET_NEON"
+  "vaddw.%T3%#<V_sz_elem>\t%q0, %q1, %P2")
+
+; vhadd and vrhadd.
+
+(define_insn "neon_vhadd<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w")
+		       (match_operand:SI 3 "immediate_operand" "i")]
+		      UNSPEC_VHADD))]
+  "TARGET_NEON"
+  "v%O3hadd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vqadd<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+        (unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VQADD))]
+  "TARGET_NEON"
+  "vqadd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vaddhn<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+        (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+		            (match_operand:VN 2 "s_register_operand" "w")
+                            (match_operand:SI 3 "immediate_operand" "i")]
+                           UNSPEC_VADDHN))]
+  "TARGET_NEON"
+  "v%O3addhn.<V_if_elem>\t%P0, %q1, %q2")
+
+(define_insn "neon_vmul<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:SI 3 "immediate_operand" "i")]
+		     UNSPEC_VMUL))]
+  "TARGET_NEON"
+  "vmul.%F3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vmla<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:VDQW 3 "s_register_operand" "w")
+                     (match_operand:SI 4 "immediate_operand" "i")]
+                    UNSPEC_VMLA))]
+  "TARGET_NEON"
+  "vmla.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3")
+
+(define_insn "neon_vmlal<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+		           (match_operand:VW 2 "s_register_operand" "w")
+		           (match_operand:VW 3 "s_register_operand" "w")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VMLAL))]
+  "TARGET_NEON"
+  "vmlal.%T4%#<V_sz_elem>\t%q0, %P2, %P3")
+
+(define_insn "neon_vmls<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:VDQW 3 "s_register_operand" "w")
+                     (match_operand:SI 4 "immediate_operand" "i")]
+                    UNSPEC_VMLS))]
+  "TARGET_NEON"
+  "vmls.<V_if_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3")
+
+(define_insn "neon_vmlsl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+		           (match_operand:VW 2 "s_register_operand" "w")
+		           (match_operand:VW 3 "s_register_operand" "w")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VMLSL))]
+  "TARGET_NEON"
+  "vmlsl.%T4%#<V_sz_elem>\t%q0, %P2, %P3")
+
+(define_insn "neon_vqdmulh<mode>"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+        (unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "w")
+		       (match_operand:VMDQI 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VQDMULH))]
+  "TARGET_NEON"
+  "vq%O3dmulh.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vqdmlal<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+		           (match_operand:VMDI 2 "s_register_operand" "w")
+		           (match_operand:VMDI 3 "s_register_operand" "w")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VQDMLAL))]
+  "TARGET_NEON"
+  "vqdmlal.<V_s_elem>\t%q0, %P2, %P3")
+
+(define_insn "neon_vqdmlsl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+		           (match_operand:VMDI 2 "s_register_operand" "w")
+		           (match_operand:VMDI 3 "s_register_operand" "w")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VQDMLSL))]
+  "TARGET_NEON"
+  "vqdmlsl.<V_s_elem>\t%q0, %P2, %P3")
+
+(define_insn "neon_vmull<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
+		           (match_operand:VW 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VMULL))]
+  "TARGET_NEON"
+  "vmull.%T3%#<V_sz_elem>\t%q0, %P1, %P2")
+
+(define_insn "neon_vqdmull<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
+		           (match_operand:VMDI 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VQDMULL))]
+  "TARGET_NEON"
+  "vqdmull.<V_s_elem>\t%q0, %P1, %P2")
+
+(define_insn "neon_vsub<mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
+        (unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")
+		      (match_operand:VDQX 2 "s_register_operand" "w")
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VSUB))]
+  "TARGET_NEON"
+  "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vsubl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w")
+		           (match_operand:VDI 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VSUBL))]
+  "TARGET_NEON"
+  "vsubl.%T3%#<V_sz_elem>\t%q0, %P1, %P2")
+
+(define_insn "neon_vsubw<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "w")
+		           (match_operand:VDI 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+			  UNSPEC_VSUBW))]
+  "TARGET_NEON"
+  "vsubw.%T3%#<V_sz_elem>\t%q0, %q1, %P2")
+
+(define_insn "neon_vqsub<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+        (unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+		      UNSPEC_VQSUB))]
+  "TARGET_NEON"
+  "vqsub.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vhsub<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+		      UNSPEC_VHSUB))]
+  "TARGET_NEON"
+  "vhsub.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vsubhn<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+        (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+		            (match_operand:VN 2 "s_register_operand" "w")
+                            (match_operand:SI 3 "immediate_operand" "i")]
+                           UNSPEC_VSUBHN))]
+  "TARGET_NEON"
+  "v%O3subhn.<V_if_elem>\t%P0, %q1, %q2")
+
+(define_insn "neon_vceq<mode>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+        (unspec:<V_cmp_result> [(match_operand:VDQW 1 "s_register_operand" "w")
+		                (match_operand:VDQW 2 "s_register_operand" "w")
+                                (match_operand:SI 3 "immediate_operand" "i")]
+                               UNSPEC_VCEQ))]
+  "TARGET_NEON"
+  "vceq.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vcge<mode>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+        (unspec:<V_cmp_result> [(match_operand:VDQW 1 "s_register_operand" "w")
+		                (match_operand:VDQW 2 "s_register_operand" "w")
+                                (match_operand:SI 3 "immediate_operand" "i")]
+                               UNSPEC_VCGE))]
+  "TARGET_NEON"
+  "vcge.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vcgt<mode>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+        (unspec:<V_cmp_result> [(match_operand:VDQW 1 "s_register_operand" "w")
+		                (match_operand:VDQW 2 "s_register_operand" "w")
+                                (match_operand:SI 3 "immediate_operand" "i")]
+                               UNSPEC_VCGT))]
+  "TARGET_NEON"
+  "vcgt.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vcage<mode>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+        (unspec:<V_cmp_result> [(match_operand:VCVTF 1 "s_register_operand" "w")
+		                (match_operand:VCVTF 2 "s_register_operand" "w")
+                                (match_operand:SI 3 "immediate_operand" "i")]
+                               UNSPEC_VCAGE))]
+  "TARGET_NEON"
+  "vacge.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vcagt<mode>"
+  [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+        (unspec:<V_cmp_result> [(match_operand:VCVTF 1 "s_register_operand" "w")
+		                (match_operand:VCVTF 2 "s_register_operand" "w")
+                                (match_operand:SI 3 "immediate_operand" "i")]
+                               UNSPEC_VCAGT))]
+  "TARGET_NEON"
+  "vacgt.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vtst<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+		      UNSPEC_VTST))]
+  "TARGET_NEON"
+  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vabd<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:SI 3 "immediate_operand" "i")]
+		     UNSPEC_VABD))]
+  "TARGET_NEON"
+  "vabd.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vabdl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
+		           (match_operand:VW 2 "s_register_operand" "w")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+                          UNSPEC_VABDL))]
+  "TARGET_NEON"
+  "vabdl.%T3%#<V_sz_elem>\t%q0, %P1, %P2")
+
+(define_insn "neon_vaba<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "0")
+		       (match_operand:VDQIW 2 "s_register_operand" "w")
+		       (match_operand:VDQIW 3 "s_register_operand" "w")
+                       (match_operand:SI 4 "immediate_operand" "i")]
+		      UNSPEC_VABA))]
+  "TARGET_NEON"
+  "vaba.%T4%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3")
+
+(define_insn "neon_vabal<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+        (unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+		           (match_operand:VW 2 "s_register_operand" "w")
+		           (match_operand:VW 3 "s_register_operand" "w")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VABAL))]
+  "TARGET_NEON"
+  "vabal.%T4%#<V_sz_elem>\t%q0, %P2, %P3")
+
+(define_insn "neon_vmax<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VMAX))]
+  "TARGET_NEON"
+  "vmax.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vmin<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
+		      (match_operand:VDQW 2 "s_register_operand" "w")
+		      (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VMIN))]
+  "TARGET_NEON"
+  "vmin.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_expand "neon_vpadd<mode>"
+  [(match_operand:VD 0 "s_register_operand" "=w")
+   (match_operand:VD 1 "s_register_operand" "w")
+   (match_operand:VD 2 "s_register_operand" "w")
+   (match_operand:SI 3 "immediate_operand" "i")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vpadd_internal<mode> (operands[0], operands[1],
+					    operands[2]));
+  DONE;
+})
+
+(define_insn "neon_vpaddl<mode>"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
+        (unspec:<V_double_width> [(match_operand:VDQIW 1 "s_register_operand" "w")
+                                  (match_operand:SI 2 "immediate_operand" "i")]
+                                 UNSPEC_VPADDL))]
+  "TARGET_NEON"
+  "vpaddl.%T2%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vpadal<mode>"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "=w")
+        (unspec:<V_double_width> [(match_operand:<V_double_width> 1 "s_register_operand" "0")
+                                  (match_operand:VDQIW 2 "s_register_operand" "w")
+                                  (match_operand:SI 3 "immediate_operand" "i")]
+                                 UNSPEC_VPADAL))]
+  "TARGET_NEON"
+  "vpadal.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2")
+
+(define_insn "neon_vpmax<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+        (unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:VD 2 "s_register_operand" "w")
+                    (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VPMAX))]
+  "TARGET_NEON"
+  "vpmax.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vpmin<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+        (unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:VD 2 "s_register_operand" "w")
+                    (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VPMIN))]
+  "TARGET_NEON"
+  "vpmin.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vrecps<mode>"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VRECPS))]
+  "TARGET_NEON"
+  "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vrsqrts<mode>"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+		       (match_operand:VCVTF 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VRSQRTS))]
+  "TARGET_NEON"
+  "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vabs<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+                     UNSPEC_VABS))]
+  "TARGET_NEON"
+  "vabs.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vqabs<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")]
+		      UNSPEC_VQABS))]
+  "TARGET_NEON"
+  "vqabs.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_expand "neon_vneg<mode>"
+  [(match_operand:VDQW 0 "s_register_operand" "")
+   (match_operand:VDQW 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neg<mode>2 (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_vqneg<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")]
+		      UNSPEC_VQNEG))]
+  "TARGET_NEON"
+  "vqneg.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vcls<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")]
+		      UNSPEC_VCLS))]
+  "TARGET_NEON"
+  "vcls.<V_s_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vclz<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+	(unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")]
+		      UNSPEC_VCLZ))]
+  "TARGET_NEON"
+  "vclz.<V_if_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vcnt<mode>"
+  [(set (match_operand:VE 0 "s_register_operand" "=w")
+	(unspec:VE [(match_operand:VE 1 "s_register_operand" "w")
+                    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VCNT))]
+  "TARGET_NEON"
+  "vcnt.<V_sz_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vrecpe<mode>"
+  [(set (match_operand:V32 0 "s_register_operand" "=w")
+	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")
+                     (match_operand:SI 2 "immediate_operand" "i")]
+                    UNSPEC_VRECPE))]
+  "TARGET_NEON"
+  "vrecpe.<V_u_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vrsqrte<mode>"
+  [(set (match_operand:V32 0 "s_register_operand" "=w")
+	(unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")
+                     (match_operand:SI 2 "immediate_operand" "i")]
+                    UNSPEC_VRSQRTE))]
+  "TARGET_NEON"
+  "vrsqrte.<V_u_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_expand "neon_vmvn<mode>"
+  [(match_operand:VDQIW 0 "s_register_operand" "")
+   (match_operand:VDQIW 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[1]));
+  DONE;
+})
+
+;; FIXME: 32-bit element sizes are a bit funky (should be output as .32 not
+;; .u32), but the assembler should cope with that.
+
+(define_insn "neon_vget_lane<mode>"
+  [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+	(unspec:<V_elem> [(match_operand:VD 1 "s_register_operand" "w")
+			  (match_operand:SI 2 "immediate_operand" "i")
+                          (match_operand:SI 3 "immediate_operand" "i")]
+                         UNSPEC_VGET_LANE))]
+  "TARGET_NEON"
+  "vmov%?.%t3%#<V_sz_elem>\t%0, %P1[%c2]"
+  [(set_attr "predicable" "yes")])
+
+; Operand 2 (lane number) is ignored because we can only extract the zeroth lane
+; with this insn. Operand 3 (info word) is ignored because it does nothing
+; useful with 64-bit elements.
+
+(define_insn "neon_vget_lanedi"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+       (unspec:DI [(match_operand:DI 1 "s_register_operand" "w")
+                   (match_operand:SI 2 "immediate_operand" "i")
+                   (match_operand:SI 3 "immediate_operand" "i")]
+                  UNSPEC_VGET_LANE))]
+  "TARGET_NEON"
+  "vmov%?\t%Q0, %R0, %P1  @ di"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vget_lane<mode>"
+  [(set (match_operand:<V_elem> 0 "s_register_operand" "=r")
+       (unspec:<V_elem> [(match_operand:VQ 1 "s_register_operand" "w")
+                         (match_operand:SI 2 "immediate_operand" "i")
+                         (match_operand:SI 3 "immediate_operand" "i")]
+                        UNSPEC_VGET_LANE))]
+  "TARGET_NEON"
+{
+  rtx ops[4];
+  int regno = REGNO (operands[1]);
+  unsigned int halfelts = GET_MODE_NUNITS (<MODE>mode) / 2;
+  unsigned int elt = INTVAL (operands[2]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (<V_HALF>mode, regno + 2 * (elt / halfelts));
+  ops[2] = GEN_INT (elt % halfelts);
+  ops[3] = operands[3];
+  output_asm_insn ("vmov%?.%t3%#<V_sz_elem>\t%0, %P1[%c2]", ops);
+
+  return "";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vget_lanev2di"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+       (unspec:DI [(match_operand:V2DI 1 "s_register_operand" "w")
+                   (match_operand:SI 2 "immediate_operand" "i")
+                   (match_operand:SI 3 "immediate_operand" "i")]
+                  UNSPEC_VGET_LANE))]
+  "TARGET_NEON"
+{
+  rtx ops[2];
+  unsigned int regno = REGNO (operands[1]);
+  unsigned int elt = INTVAL (operands[2]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno + 2 * elt);
+  output_asm_insn ("vmov%?\t%Q0, %R0, %P1  @ v2di", ops);
+
+  return "";
+}
+  [(set_attr "predicable" "yes")])
+
+
+(define_insn "neon_vset_lane<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+	(unspec:VD [(match_operand:<V_elem> 1 "s_register_operand" "r")
+		    (match_operand:VD 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VSET_LANE))]
+  "TARGET_NEON"
+  "vmov%?.<V_sz_elem>\t%P0[%c3], %1"
+  [(set_attr "predicable" "yes")])
+
+; See neon_vget_lanedi comment for reasons operands 2 & 3 are ignored.
+
+(define_insn "neon_vset_lanedi"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "r")
+		    (match_operand:DI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VSET_LANE))]
+  "TARGET_NEON"
+  "vmov%?\t%P0, %Q1, %R1  @ di"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vset_lane<mode>"
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+	(unspec:VQ [(match_operand:<V_elem> 1 "s_register_operand" "r")
+		    (match_operand:VQ 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VSET_LANE))]
+  "TARGET_NEON"
+{
+  rtx ops[4];
+  unsigned int regno = REGNO (operands[0]);
+  unsigned int halfelts = GET_MODE_NUNITS (<MODE>mode) / 2;
+  unsigned int elt = INTVAL (operands[3]);
+
+  ops[0] = gen_rtx_REG (<V_HALF>mode, regno + 2 * (elt / halfelts));
+  ops[1] = operands[1];
+  ops[2] = GEN_INT (elt % halfelts);
+  output_asm_insn ("vmov%?.<V_sz_elem>\t%P0[%c2], %1", ops);
+
+  return "";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vset_lanev2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_operand:DI 1 "s_register_operand" "r")
+		      (match_operand:V2DI 2 "s_register_operand" "0")
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                   UNSPEC_VSET_LANE))]
+  "TARGET_NEON"
+{
+  rtx ops[2];
+  unsigned int regno = REGNO (operands[0]);
+  unsigned int elt = INTVAL (operands[3]);
+
+  ops[0] = gen_rtx_REG (DImode, regno + 2 * elt);
+  ops[1] = operands[1];
+  output_asm_insn ("vmov%?\t%P0, %Q1, %R1  @ v2di", ops);
+
+  return "";
+}
+  [(set_attr "predicable" "yes")])
+
+(define_expand "neon_vcreate<mode>"
+  [(match_operand:VDX 0 "s_register_operand" "")
+   (match_operand:DI 1 "general_operand" "")]
+  "TARGET_NEON"
+{
+  rtx src = gen_lowpart (<MODE>mode, operands[1]);
+  emit_move_insn (operands[0], src);
+  DONE;
+})
+
+(define_insn "neon_vdup_n<mode>"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(unspec:VDQW [(match_operand:<V_elem> 1 "s_register_operand" "r")]
+                    UNSPEC_VDUP_N))]
+  "TARGET_NEON"
+  "vdup%?.<V_sz_elem>\t%<V_reg>0, %1"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vdup_ndi"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "r")]
+                   UNSPEC_VDUP_N))]
+  "TARGET_NEON"
+  "vmov%?\t%P0, %Q1, %R1"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "neon_vdup_nv2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_operand:DI 1 "s_register_operand" "r")]
+                     UNSPEC_VDUP_N))]
+  "TARGET_NEON"
+  "vmov%?\t%e0, %Q1, %R1\;vmov%?\t%f0, %Q1, %R1"
+  [(set_attr "predicable" "yes")
+   (set_attr "length" "8")])
+
+(define_insn "neon_vdup_lane<mode>"
+  [(set (match_operand:VD 0 "s_register_operand" "=w")
+	(unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VDUP_LANE))]
+  "TARGET_NEON"
+  "vdup.<V_sz_elem>\t%P0, %P1[%c2]")
+
+(define_insn "neon_vdup_lane<mode>"
+  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+	(unspec:VQ [(match_operand:<V_HALF> 1 "s_register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VDUP_LANE))]
+  "TARGET_NEON"
+  "vdup.<V_sz_elem>\t%q0, %P1[%c2]")
+
+; Scalar index is ignored, since only zero is valid here.
+(define_expand "neon_vdup_lanedi"
+  [(set (match_operand:DI 0 "s_register_operand" "=w")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VDUP_LANE))]
+  "TARGET_NEON"
+{
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
+
+; Likewise.
+(define_insn "neon_vdup_lanev2di"
+  [(set (match_operand:V2DI 0 "s_register_operand" "=w")
+	(unspec:V2DI [(match_operand:DI 1 "s_register_operand" "w")
+		      (match_operand:SI 2 "immediate_operand" "i")]
+                     UNSPEC_VDUP_LANE))]
+  "TARGET_NEON"
+  "vmov\t%e0, %P1\;vmov\t%f0, %P1"
+  [(set_attr "length" "8")])
+
+;; In this insn, operand 1 should be low, and operand 2 the high part of the
+;; dest vector.
+;; FIXME: A different implementation of this builtin could make it much
+;; more likely that we wouldn't actually need to output anything (we could make
+;; it so that the reg allocator puts things in the right places magically
+;; instead). Lack of subregs for vectors makes that tricky though, I think.
+
+(define_insn "neon_vcombine<mode>"
+  [(set (match_operand:<V_DOUBLE> 0 "s_register_operand" "=w")
+	(unspec:<V_DOUBLE> [(match_operand:VDX 1 "s_register_operand" "w")
+			    (match_operand:VDX 2 "s_register_operand" "w")]
+                           UNSPEC_VCOMBINE))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src1 = REGNO (operands[1]);
+  int src2 = REGNO (operands[2]);
+  rtx destlo;
+
+  if (src1 == dest && src2 == dest + 2)
+    return "";
+  else if (src2 == dest && src1 == dest + 2)
+    /* Special case of reversed high/low parts.  */
+    return "vswp\t%P1, %P2";
+
+  destlo = gen_rtx_REG (<MODE>mode, dest);
+
+  if (!reg_overlap_mentioned_p (operands[2], destlo))
+    {
+      /* Try to avoid unnecessary moves if part of the result is in the right
+         place already.  */
+      if (src1 != dest)
+        output_asm_insn ("vmov\t%e0, %P1", operands);
+      if (src2 != dest + 2)
+        output_asm_insn ("vmov\t%f0, %P2", operands);
+    }
+  else
+    {
+      if (src2 != dest + 2)
+        output_asm_insn ("vmov\t%f0, %P2", operands);
+      if (src1 != dest)
+        output_asm_insn ("vmov\t%e0, %P1", operands);
+    }
+
+  return "";
+}
+  [(set_attr "length" "8")])
+
+(define_insn "neon_vget_high<mode>"
+  [(set (match_operand:<V_HALF> 0 "s_register_operand" "=w")
+	(unspec:<V_HALF> [(match_operand:VQX 1 "s_register_operand" "w")]
+			 UNSPEC_VGET_HIGH))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src + 2)
+    return "vmov\t%P0, %f1";
+  else
+    return "";
+})
+
+(define_insn "neon_vget_low<mode>"
+  [(set (match_operand:<V_HALF> 0 "s_register_operand" "=w")
+	(unspec:<V_HALF> [(match_operand:VQX 1 "s_register_operand" "w")]
+			 UNSPEC_VGET_LOW))]
+  "TARGET_NEON"
+{
+  int dest = REGNO (operands[0]);
+  int src = REGNO (operands[1]);
+
+  if (dest != src)
+    return "vmov\t%P0, %e1";
+  else
+    return "";
+})
+
+(define_insn "neon_vcvt<mode>"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")]
+			  UNSPEC_VCVT))]
+  "TARGET_NEON"
+  "vcvt.%T2%#32.f32\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vcvt<mode>"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")]
+			  UNSPEC_VCVT))]
+  "TARGET_NEON"
+  "vcvt.f32.%T2%#32\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vcvt_n<mode>"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+	(unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+			  UNSPEC_VCVT_N))]
+  "TARGET_NEON"
+  "vcvt.%T3%#32.f32\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vcvt_n<mode>"
+  [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+	(unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")
+                           (match_operand:SI 3 "immediate_operand" "i")]
+			  UNSPEC_VCVT_N))]
+  "TARGET_NEON"
+  "vcvt.f32.%T3%#32\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vmovn<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           UNSPEC_VMOVN))]
+  "TARGET_NEON"
+  "vmovn.<V_if_elem>\t%P0, %q1")
+
+(define_insn "neon_vqmovn<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           UNSPEC_VQMOVN))]
+  "TARGET_NEON"
+  "vqmovn.%T2%#<V_sz_elem>\t%P0, %q1")
+
+(define_insn "neon_vqmovun<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")]
+                           UNSPEC_VQMOVUN))]
+  "TARGET_NEON"
+  "vqmovun.<V_s_elem>\t%P0, %q1")
+
+(define_insn "neon_vmovl<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")]
+                          UNSPEC_VMOVL))]
+  "TARGET_NEON"
+  "vmovl.%T2%#<V_sz_elem>\t%q0, %P1")
+
+(define_insn "neon_vmul_lane<mode>"
+  [(set (match_operand:VMD 0 "s_register_operand" "=w")
+	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "w")
+		     (match_operand:VMD 2 "s_register_operand"
+                                        "<scalar_mul_constraint>")
+                     (match_operand:SI 3 "immediate_operand" "i")
+                     (match_operand:SI 4 "immediate_operand" "i")]
+                    UNSPEC_VMUL_LANE))]
+  "TARGET_NEON"
+  "vmul.<V_if_elem>\t%P0, %P1, %P2[%c3]")
+
+(define_insn "neon_vmul_lane<mode>"
+  [(set (match_operand:VMQ 0 "s_register_operand" "=w")
+	(unspec:VMQ [(match_operand:VMQ 1 "s_register_operand" "w")
+		     (match_operand:<V_HALF> 2 "s_register_operand"
+                                             "<scalar_mul_constraint>")
+                     (match_operand:SI 3 "immediate_operand" "i")
+                     (match_operand:SI 4 "immediate_operand" "i")]
+                    UNSPEC_VMUL_LANE))]
+  "TARGET_NEON"
+  "vmul.<V_if_elem>\t%q0, %q1, %P2[%c3]")
+
+(define_insn "neon_vmull_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
+		           (match_operand:VMDI 2 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 3 "immediate_operand" "i")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VMULL_LANE))]
+  "TARGET_NEON"
+  "vmull.%T4%#<V_sz_elem>\t%q0, %P1, %P2[%c3]")
+
+(define_insn "neon_vqdmull_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
+		           (match_operand:VMDI 2 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 3 "immediate_operand" "i")
+                           (match_operand:SI 4 "immediate_operand" "i")]
+                          UNSPEC_VQDMULL_LANE))]
+  "TARGET_NEON"
+  "vqdmull.<V_s_elem>\t%q0, %P1, %P2[%c3]")
+
+(define_insn "neon_vqdmulh_lane<mode>"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "w")
+		      (match_operand:<V_HALF> 2 "s_register_operand"
+					      "<scalar_mul_constraint>")
+                      (match_operand:SI 3 "immediate_operand" "i")
+                      (match_operand:SI 4 "immediate_operand" "i")]
+                      UNSPEC_VQDMULH_LANE))]
+  "TARGET_NEON"
+  "vq%O4dmulh.%T4%#<V_sz_elem>\t%q0, %q1, %P2[%c3]")
+
+(define_insn "neon_vqdmulh_lane<mode>"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "w")
+		      (match_operand:VMDI 2 "s_register_operand"
+					  "<scalar_mul_constraint>")
+                      (match_operand:SI 3 "immediate_operand" "i")
+                      (match_operand:SI 4 "immediate_operand" "i")]
+                      UNSPEC_VQDMULH_LANE))]
+  "TARGET_NEON"
+  "vq%O4dmulh.%T4%#<V_sz_elem>\t%P0, %P1, %P2[%c3]")
+
+(define_insn "neon_vmla_lane<mode>"
+  [(set (match_operand:VMD 0 "s_register_operand" "=w")
+	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
+		     (match_operand:VMD 2 "s_register_operand" "w")
+                     (match_operand:VMD 3 "s_register_operand"
+					"<scalar_mul_constraint>")
+                     (match_operand:SI 4 "immediate_operand" "i")
+                     (match_operand:SI 5 "immediate_operand" "i")]
+                     UNSPEC_VMLA_LANE))]
+  "TARGET_NEON"
+  "vmla.<V_if_elem>\t%P0, %P2, %P3[%c4]")
+
+(define_insn "neon_vmla_lane<mode>"
+  [(set (match_operand:VMQ 0 "s_register_operand" "=w")
+	(unspec:VMQ [(match_operand:VMQ 1 "s_register_operand" "0")
+		     (match_operand:VMQ 2 "s_register_operand" "w")
+                     (match_operand:<V_HALF> 3 "s_register_operand"
+					     "<scalar_mul_constraint>")
+                     (match_operand:SI 4 "immediate_operand" "i")
+                     (match_operand:SI 5 "immediate_operand" "i")]
+                     UNSPEC_VMLA_LANE))]
+  "TARGET_NEON"
+  "vmla.<V_if_elem>\t%q0, %q2, %P3[%c4]")
+
+(define_insn "neon_vmlal_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+			   (match_operand:VMDI 2 "s_register_operand" "w")
+                           (match_operand:VMDI 3 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 4 "immediate_operand" "i")
+                           (match_operand:SI 5 "immediate_operand" "i")]
+                          UNSPEC_VMLAL_LANE))]
+  "TARGET_NEON"
+  "vmlal.%T5%#<V_sz_elem>\t%q0, %P2, %P3[%c4]")
+
+(define_insn "neon_vqdmlal_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+			   (match_operand:VMDI 2 "s_register_operand" "w")
+                           (match_operand:VMDI 3 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 4 "immediate_operand" "i")
+                           (match_operand:SI 5 "immediate_operand" "i")]
+                          UNSPEC_VQDMLAL_LANE))]
+  "TARGET_NEON"
+  "vqdmlal.<V_s_elem>\t%q0, %P2, %P3[%c4]")
+
+(define_insn "neon_vmls_lane<mode>"
+  [(set (match_operand:VMD 0 "s_register_operand" "=w")
+	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
+		     (match_operand:VMD 2 "s_register_operand" "w")
+                     (match_operand:VMD 3 "s_register_operand"
+					"<scalar_mul_constraint>")
+                     (match_operand:SI 4 "immediate_operand" "i")
+                     (match_operand:SI 5 "immediate_operand" "i")]
+                    UNSPEC_VMLS_LANE))]
+  "TARGET_NEON"
+  "vmls.<V_if_elem>\t%P0, %P2, %P3[%c4]")
+
+(define_insn "neon_vmls_lane<mode>"
+  [(set (match_operand:VMQ 0 "s_register_operand" "=w")
+	(unspec:VMQ [(match_operand:VMQ 1 "s_register_operand" "0")
+		     (match_operand:VMQ 2 "s_register_operand" "w")
+                     (match_operand:<V_HALF> 3 "s_register_operand"
+					     "<scalar_mul_constraint>")
+                     (match_operand:SI 4 "immediate_operand" "i")
+                     (match_operand:SI 5 "immediate_operand" "i")]
+                    UNSPEC_VMLS_LANE))]
+  "TARGET_NEON"
+  "vmls.<V_if_elem>\t%q0, %q2, %P3[%c4]")
+
+(define_insn "neon_vmlsl_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+			   (match_operand:VMDI 2 "s_register_operand" "w")
+                           (match_operand:VMDI 3 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 4 "immediate_operand" "i")
+                           (match_operand:SI 5 "immediate_operand" "i")]
+                          UNSPEC_VMLSL_LANE))]
+  "TARGET_NEON"
+  "vmlsl.%T5%#<V_sz_elem>\t%q0, %P2, %P3[%c4]")
+
+(define_insn "neon_vqdmlsl_lane<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:<V_widen> 1 "s_register_operand" "0")
+			   (match_operand:VMDI 2 "s_register_operand" "w")
+                           (match_operand:VMDI 3 "s_register_operand"
+					       "<scalar_mul_constraint>")
+                           (match_operand:SI 4 "immediate_operand" "i")
+                           (match_operand:SI 5 "immediate_operand" "i")]
+                          UNSPEC_VQDMLSL_LANE))]
+  "TARGET_NEON"
+  "vqdmlsl.<V_s_elem>\t%q0, %P2, %P3[%c4]")
+
+; FIXME: For the "_n" multiply/multiply-accumulate insns, we copy a value in a
+; core register into a temp register, then use a scalar taken from that. This
+; isn't an optimal solution if e.g. the scalar has just been read from memory
+; or extracted from another vector. The latter case it's currently better to
+; use the "_lane" variant, and the former case can probably be implemented
+; using vld1_lane, but that hasn't been done yet.
+
+(define_expand "neon_vmul_n<mode>"
+  [(match_operand:VMD 0 "s_register_operand" "")
+   (match_operand:VMD 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
+				       const0_rtx, const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vmul_n<mode>"
+  [(match_operand:VMQ 0 "s_register_operand" "")
+   (match_operand:VMQ 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<V_HALF>mode);
+  emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
+				       const0_rtx, const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vmull_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:VMDI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vmull_lane<mode> (operands[0], operands[1], tmp,
+				        const0_rtx, operands[3]));
+  DONE;
+})
+
+(define_expand "neon_vqdmull_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:VMDI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vqdmull_lane<mode> (operands[0], operands[1], tmp,
+				          const0_rtx, const0_rtx));
+  DONE;
+})
+
+(define_expand "neon_vqdmulh_n<mode>"
+  [(match_operand:VMDI 0 "s_register_operand" "")
+   (match_operand:VMDI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vqdmulh_lane<mode> (operands[0], operands[1], tmp,
+				          const0_rtx, operands[3]));
+  DONE;
+})
+
+(define_expand "neon_vqdmulh_n<mode>"
+  [(match_operand:VMQI 0 "s_register_operand" "")
+   (match_operand:VMQI 1 "s_register_operand" "")
+   (match_operand:<V_elem> 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<V_HALF>mode);
+  emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[2], tmp, const0_rtx));
+  emit_insn (gen_neon_vqdmulh_lane<mode> (operands[0], operands[1], tmp,
+				          const0_rtx, operands[3]));
+  DONE;
+})
+
+(define_expand "neon_vmla_n<mode>"
+  [(match_operand:VMD 0 "s_register_operand" "")
+   (match_operand:VMD 1 "s_register_operand" "")
+   (match_operand:VMD 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmla_lane<mode> (operands[0], operands[1], operands[2],
+				       tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vmla_n<mode>"
+  [(match_operand:VMQ 0 "s_register_operand" "")
+   (match_operand:VMQ 1 "s_register_operand" "")
+   (match_operand:VMQ 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<V_HALF>mode);
+  emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmla_lane<mode> (operands[0], operands[1], operands[2],
+				       tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vmlal_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmlal_lane<mode> (operands[0], operands[1], operands[2],
+					tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vqdmlal_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vqdmlal_lane<mode> (operands[0], operands[1], operands[2],
+					  tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vmls_n<mode>"
+  [(match_operand:VMD 0 "s_register_operand" "")
+   (match_operand:VMD 1 "s_register_operand" "")
+   (match_operand:VMD 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmls_lane<mode> (operands[0], operands[1], operands[2],
+				       tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vmls_n<mode>"
+  [(match_operand:VMQ 0 "s_register_operand" "")
+   (match_operand:VMQ 1 "s_register_operand" "")
+   (match_operand:VMQ 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<V_HALF>mode);
+  emit_insn (gen_neon_vset_lane<V_half> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmls_lane<mode> (operands[0], operands[1], operands[2],
+				       tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vmlsl_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vmlsl_lane<mode> (operands[0], operands[1], operands[2],
+					tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_expand "neon_vqdmlsl_n<mode>"
+  [(match_operand:<V_widen> 0 "s_register_operand" "")
+   (match_operand:<V_widen> 1 "s_register_operand" "")
+   (match_operand:VMDI 2 "s_register_operand" "")
+   (match_operand:<V_elem> 3 "s_register_operand" "")
+   (match_operand:SI 4 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  rtx tmp = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_neon_vset_lane<mode> (tmp, operands[3], tmp, const0_rtx));
+  emit_insn (gen_neon_vqdmlsl_lane<mode> (operands[0], operands[1], operands[2],
+					  tmp, const0_rtx, operands[4]));
+  DONE;
+})
+
+(define_insn "neon_vext<mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
+	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")
+		      (match_operand:VDQX 2 "s_register_operand" "w")
+                      (match_operand:SI 3 "immediate_operand" "i")]
+                     UNSPEC_VEXT))]
+  "TARGET_NEON"
+  "vext.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2, %3")
+
+(define_insn "neon_vrev64<mode>"
+  [(set (match_operand:VDQ 0 "s_register_operand" "=w")
+	(unspec:VDQ [(match_operand:VDQ 1 "s_register_operand" "w")
+		     (match_operand:SI 2 "immediate_operand" "i")]
+                    UNSPEC_VREV64))]
+  "TARGET_NEON"
+  "vrev64.<V_sz_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vrev32<mode>"
+  [(set (match_operand:VX 0 "s_register_operand" "=w")
+	(unspec:VX [(match_operand:VX 1 "s_register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VREV32))]
+  "TARGET_NEON"
+  "vrev32.<V_sz_elem>\t%<V_reg>0, %<V_reg>1")
+
+(define_insn "neon_vrev16<mode>"
+  [(set (match_operand:VE 0 "s_register_operand" "=w")
+	(unspec:VE [(match_operand:VE 1 "s_register_operand" "w")
+		    (match_operand:SI 2 "immediate_operand" "i")]
+                   UNSPEC_VREV16))]
+  "TARGET_NEON"
+  "vrev16.<V_sz_elem>\t%<V_reg>0, %<V_reg>1")
+
+; vbsl_* intrinsics may compile to any of vbsl/vbif/vbit depending on register
+; allocation. For an intrinsic of form:
+;   rD = vbsl_* (rS, rN, rM)
+; We can use any of:
+;   vbsl rS, rN, rM  (if D = S)
+;   vbit rD, rN, rS  (if D = M, so 1-bits in rS choose bits from rN, else rM)
+;   vbif rD, rM, rS  (if D = N, so 0-bits in rS choose bits from rM, else rN)
+
+(define_insn "neon_vbsl<mode>_internal"
+  [(set (match_operand:VDQX 0 "s_register_operand"		 "=w,w,w")
+	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" " 0,w,w")
+		      (match_operand:VDQX 2 "s_register_operand" " w,w,0")
+                      (match_operand:VDQX 3 "s_register_operand" " w,0,w")]
+                     UNSPEC_VBSL))]
+  "TARGET_NEON"
+  "@
+  vbsl\t%<V_reg>0, %<V_reg>2, %<V_reg>3
+  vbit\t%<V_reg>0, %<V_reg>2, %<V_reg>1
+  vbif\t%<V_reg>0, %<V_reg>3, %<V_reg>1")
+
+(define_expand "neon_vbsl<mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand" "")
+        (unspec:VDQX [(match_operand:<V_cmp_result> 1 "s_register_operand" "")
+                      (match_operand:VDQX 2 "s_register_operand" "")
+                      (match_operand:VDQX 3 "s_register_operand" "")]
+                     UNSPEC_VBSL))]
+  "TARGET_NEON"
+{
+  /* We can't alias operands together if they have different modes.  */
+  operands[1] = gen_lowpart (<MODE>mode, operands[1]);
+})
+
+(define_insn "neon_vshl<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VSHL))]
+  "TARGET_NEON"
+  "v%O3shl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vqshl<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VQSHL))]
+  "TARGET_NEON"
+  "vq%O3shl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2")
+
+(define_insn "neon_vshr_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VSHR_N))]
+  "TARGET_NEON"
+  "v%O3shr.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vshrn_n<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")
+			    (match_operand:SI 3 "immediate_operand" "i")]
+                           UNSPEC_VSHRN_N))]
+  "TARGET_NEON"
+  "v%O3shrn.<V_if_elem>\t%P0, %q1, %2")
+
+(define_insn "neon_vqshrn_n<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")
+			    (match_operand:SI 3 "immediate_operand" "i")]
+                           UNSPEC_VQSHRN_N))]
+  "TARGET_NEON"
+  "vq%O3shrn.%T3%#<V_sz_elem>\t%P0, %q1, %2")
+
+(define_insn "neon_vqshrun_n<mode>"
+  [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+	(unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")
+			    (match_operand:SI 2 "immediate_operand" "i")
+			    (match_operand:SI 3 "immediate_operand" "i")]
+                           UNSPEC_VQSHRUN_N))]
+  "TARGET_NEON"
+  "vq%O3shrun.%T3%#<V_sz_elem>\t%P0, %q1, %2")
+
+(define_insn "neon_vshl_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VSHL_N))]
+  "TARGET_NEON"
+  "vshl.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vqshl_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VQSHL_N))]
+  "TARGET_NEON"
+  "vqshl.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vqshlu_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "w")
+		       (match_operand:SI 2 "immediate_operand" "i")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VQSHLU_N))]
+  "TARGET_NEON"
+  "vqshlu.%T3%#<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %2")
+
+(define_insn "neon_vshll_n<mode>"
+  [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+	(unspec:<V_widen> [(match_operand:VW 1 "s_register_operand" "w")
+			   (match_operand:SI 2 "immediate_operand" "i")
+			   (match_operand:SI 3 "immediate_operand" "i")]
+			  UNSPEC_VSHLL_N))]
+  "TARGET_NEON"
+  "vshll.%T3%#<V_sz_elem>\t%q0, %P1, %2")
+
+(define_insn "neon_vsra_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "0")
+		       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")
+                       (match_operand:SI 4 "immediate_operand" "i")]
+                      UNSPEC_VSRA_N))]
+  "TARGET_NEON"
+  "v%O4sra.%T4%#<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %3")
+
+(define_insn "neon_vsri_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "0")
+        	       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VSRI))]
+  "TARGET_NEON"
+  "vsri.<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %3")
+
+(define_insn "neon_vsli_n<mode>"
+  [(set (match_operand:VDQIX 0 "s_register_operand" "=w")
+	(unspec:VDQIX [(match_operand:VDQIX 1 "s_register_operand" "0")
+        	       (match_operand:VDQIX 2 "s_register_operand" "w")
+                       (match_operand:SI 3 "immediate_operand" "i")]
+                      UNSPEC_VSLI))]
+  "TARGET_NEON"
+  "vsli.<V_sz_elem>\t%<V_reg>0, %<V_reg>2, %3")
+
+(define_insn "neon_vtbl1v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "w")
+		      (match_operand:V8QI 2 "s_register_operand" "w")]
+                     UNSPEC_VTBL))]
+  "TARGET_NEON"
+  "vtbl.8\t%P0, {%P1}, %P2")
+
+(define_insn "neon_vtbl2v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:TI 1 "s_register_operand" "w")
+		      (match_operand:V8QI 2 "s_register_operand" "w")]
+                     UNSPEC_VTBL))]
+  "TARGET_NEON"
+{
+  rtx ops[4];
+  int tabbase = REGNO (operands[1]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = operands[2];
+  output_asm_insn ("vtbl.8\t%P0, {%P1, %P2}, %P3", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtbl3v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:EI 1 "s_register_operand" "w")
+		      (match_operand:V8QI 2 "s_register_operand" "w")]
+                     UNSPEC_VTBL))]
+  "TARGET_NEON"
+{
+  rtx ops[5];
+  int tabbase = REGNO (operands[1]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = gen_rtx_REG (V8QImode, tabbase + 4);
+  ops[4] = operands[2];
+  output_asm_insn ("vtbl.8\t%P0, {%P1, %P2, %P3}, %P4", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtbl4v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:OI 1 "s_register_operand" "w")
+		      (match_operand:V8QI 2 "s_register_operand" "w")]
+                     UNSPEC_VTBL))]
+  "TARGET_NEON"
+{
+  rtx ops[6];
+  int tabbase = REGNO (operands[1]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = gen_rtx_REG (V8QImode, tabbase + 4);
+  ops[4] = gen_rtx_REG (V8QImode, tabbase + 6);
+  ops[5] = operands[2];
+  output_asm_insn ("vtbl.8\t%P0, {%P1, %P2, %P3, %P4}, %P5", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtbx1v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "0")
+		      (match_operand:V8QI 2 "s_register_operand" "w")
+		      (match_operand:V8QI 3 "s_register_operand" "w")]
+                     UNSPEC_VTBX))]
+  "TARGET_NEON"
+  "vtbx.8\t%P0, {%P2}, %P3")
+
+(define_insn "neon_vtbx2v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "0")
+		      (match_operand:TI 2 "s_register_operand" "w")
+		      (match_operand:V8QI 3 "s_register_operand" "w")]
+                     UNSPEC_VTBX))]
+  "TARGET_NEON"
+{
+  rtx ops[4];
+  int tabbase = REGNO (operands[2]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = operands[3];
+  output_asm_insn ("vtbx.8\t%P0, {%P1, %P2}, %P3", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtbx3v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "0")
+		      (match_operand:EI 2 "s_register_operand" "w")
+		      (match_operand:V8QI 3 "s_register_operand" "w")]
+                     UNSPEC_VTBX))]
+  "TARGET_NEON"
+{
+  rtx ops[5];
+  int tabbase = REGNO (operands[2]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = gen_rtx_REG (V8QImode, tabbase + 4);
+  ops[4] = operands[3];
+  output_asm_insn ("vtbx.8\t%P0, {%P1, %P2, %P3}, %P4", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtbx4v8qi"
+  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "0")
+		      (match_operand:OI 2 "s_register_operand" "w")
+		      (match_operand:V8QI 3 "s_register_operand" "w")]
+                     UNSPEC_VTBX))]
+  "TARGET_NEON"
+{
+  rtx ops[6];
+  int tabbase = REGNO (operands[2]);
+
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (V8QImode, tabbase);
+  ops[2] = gen_rtx_REG (V8QImode, tabbase + 2);
+  ops[3] = gen_rtx_REG (V8QImode, tabbase + 4);
+  ops[4] = gen_rtx_REG (V8QImode, tabbase + 6);
+  ops[5] = operands[3];
+  output_asm_insn ("vtbx.8\t%P0, {%P1, %P2, %P3, %P4}, %P5", ops);
+
+  return "";
+})
+
+(define_insn "neon_vtrn<mode>_internal"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")]
+		     UNSPEC_VTRN1))
+   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")]
+		     UNSPEC_VTRN2))]
+  "TARGET_NEON"
+  "vtrn.<V_sz_elem>\t%<V_reg>0, %<V_reg>2")
+
+(define_expand "neon_vtrn<mode>"
+  [(match_operand:SI 0 "s_register_operand" "r")
+   (match_operand:VDQW 1 "s_register_operand" "w")
+   (match_operand:VDQW 2 "s_register_operand" "w")]
+  "TARGET_NEON"
+{
+  neon_emit_pair_result_insn (<MODE>mode, gen_neon_vtrn<mode>_internal,
+			      operands[0], operands[1], operands[2]);
+  DONE;
+})
+
+(define_insn "neon_vzip<mode>_internal"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")]
+		     UNSPEC_VZIP1))
+   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")]
+		     UNSPEC_VZIP2))]
+  "TARGET_NEON"
+  "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2")
+
+(define_expand "neon_vzip<mode>"
+  [(match_operand:SI 0 "s_register_operand" "r")
+   (match_operand:VDQW 1 "s_register_operand" "w")
+   (match_operand:VDQW 2 "s_register_operand" "w")]
+  "TARGET_NEON"
+{
+  neon_emit_pair_result_insn (<MODE>mode, gen_neon_vzip<mode>_internal,
+			      operands[0], operands[1], operands[2]);
+  DONE;
+})
+
+(define_insn "neon_vuzp<mode>_internal"
+  [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+	(unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")]
+                     UNSPEC_VUZP1))
+   (set (match_operand:VDQW 2 "s_register_operand" "=w")
+        (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")]
+		     UNSPEC_VUZP2))]
+  "TARGET_NEON"
+  "vuzp.<V_sz_elem>\t%<V_reg>0, %<V_reg>2")
+
+(define_expand "neon_vuzp<mode>"
+  [(match_operand:SI 0 "s_register_operand" "r")
+   (match_operand:VDQW 1 "s_register_operand" "w")
+   (match_operand:VDQW 2 "s_register_operand" "w")]
+  "TARGET_NEON"
+{
+  neon_emit_pair_result_insn (<MODE>mode, gen_neon_vuzp<mode>_internal,
+			      operands[0], operands[1], operands[2]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv8qi<mode>"
+  [(match_operand:V8QI 0 "s_register_operand" "")
+   (match_operand:VDX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv4hi<mode>"
+  [(match_operand:V4HI 0 "s_register_operand" "")
+   (match_operand:VDX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv2si<mode>"
+  [(match_operand:V2SI 0 "s_register_operand" "")
+   (match_operand:VDX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv2sf<mode>"
+  [(match_operand:V2SF 0 "s_register_operand" "")
+   (match_operand:VDX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretdi<mode>"
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:VDX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv16qi<mode>"
+  [(match_operand:V16QI 0 "s_register_operand" "")
+   (match_operand:VQX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv8hi<mode>"
+  [(match_operand:V8HI 0 "s_register_operand" "")
+   (match_operand:VQX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv4si<mode>"
+  [(match_operand:V4SI 0 "s_register_operand" "")
+   (match_operand:VQX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv4sf<mode>"
+  [(match_operand:V4SF 0 "s_register_operand" "")
+   (match_operand:VQX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_expand "neon_vreinterpretv2di<mode>"
+  [(match_operand:V2DI 0 "s_register_operand" "")
+   (match_operand:VQX 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+(define_insn "neon_vld1<mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand" "=w")
+        (unspec:VDQX [(mem:VDQX (match_operand:SI 1 "s_register_operand" "r"))]
+                    UNSPEC_VLD1))]
+  "TARGET_NEON"
+  "vld1.<V_sz_elem>\t%h0, [%1]")
+
+(define_insn "neon_vld1_lane<mode>"
+  [(set (match_operand:VDX 0 "s_register_operand" "=w")
+        (unspec:VDX [(mem:<V_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                     (match_operand:VDX 2 "s_register_operand" "0")
+                     (match_operand:SI 3 "immediate_operand" "i")]
+                    UNSPEC_VLD1_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  if (max == 1)
+    return "vld1.<V_sz_elem>\t%P0, [%1]";
+  else
+    return "vld1.<V_sz_elem>\t{%P0[%c3]}, [%1]";
+})
+
+(define_insn "neon_vld1_lane<mode>"
+  [(set (match_operand:VQX 0 "s_register_operand" "=w")
+        (unspec:VQX [(mem:<V_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                     (match_operand:VQX 2 "s_register_operand" "0")
+                     (match_operand:SI 3 "immediate_operand" "i")]
+                    UNSPEC_VLD1_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+      operands[3] = GEN_INT (lane);
+    }
+  operands[0] = gen_rtx_REG (<V_HALF>mode, regno);
+  if (max == 2)
+    return "vld1.<V_sz_elem>\t%P0, [%1]";
+  else
+    return "vld1.<V_sz_elem>\t{%P0[%c3]}, [%1]";
+})
+
+(define_insn "neon_vld1_dup<mode>"
+  [(set (match_operand:VDX 0 "s_register_operand" "=w")
+        (unspec:VDX [(mem:<V_elem> (match_operand:SI 1 "s_register_operand" "r"))]
+                    UNSPEC_VLD1_DUP))]
+  "TARGET_NEON"
+{
+  if (GET_MODE_NUNITS (<MODE>mode) > 1)
+    return "vld1.<V_sz_elem>\t{%P0[]}, [%1]";
+  else
+    return "vld1.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vld1_dup<mode>"
+  [(set (match_operand:VQX 0 "s_register_operand" "=w")
+        (unspec:VQX [(mem:<V_elem> (match_operand:SI 1 "s_register_operand" "r"))]
+                    UNSPEC_VLD1_DUP))]
+  "TARGET_NEON"
+{
+  if (GET_MODE_NUNITS (<MODE>mode) > 2)
+    return "vld1.<V_sz_elem>\t{%e0[], %f0[]}, [%1]";
+  else
+    return "vld1.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vst1<mode>"
+  [(set (mem:VDQX (match_operand:SI 0 "s_register_operand" "r"))
+	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")]
+		     UNSPEC_VST1))]
+  "TARGET_NEON"
+  "vst1.<V_sz_elem>\t%h1, [%0]")
+
+(define_insn "neon_vst1_lane<mode>"
+  [(set (mem:<V_elem> (match_operand:SI 0 "s_register_operand" "r"))
+	(vec_select:<V_elem>
+	  (match_operand:VDX 1 "s_register_operand" "w")
+	  (parallel [(match_operand:SI 2 "neon_lane_number" "i")])))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  if (max == 1)
+    return "vst1.<V_sz_elem>\t{%P1}, [%0]";
+  else
+    return "vst1.<V_sz_elem>\t{%P1[%c2]}, [%0]";
+})
+
+(define_insn "neon_vst1_lane<mode>"
+  [(set (mem:<V_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (vec_select:<V_elem>
+           (match_operand:VQX 1 "s_register_operand" "w")
+           (parallel [(match_operand:SI 2 "neon_lane_number" "i")])))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+      operands[2] = GEN_INT (lane);
+    }
+  operands[1] = gen_rtx_REG (<V_HALF>mode, regno);
+  if (max == 2)
+    return "vst1.<V_sz_elem>\t{%P1}, [%0]";
+  else
+    return "vst1.<V_sz_elem>\t{%P1[%c2]}, [%0]";
+})
+
+(define_insn "neon_vld2<mode>"
+  [(set (match_operand:TI 0 "s_register_operand" "=w")
+        (unspec:TI [(mem:TI (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD2))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vld1.64\t%h0, [%1]";
+  else
+    return "vld2.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vld2<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+        (unspec:OI [(mem:OI (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD2))]
+  "TARGET_NEON"
+  "vld2.<V_sz_elem>\t%h0, [%1]")
+
+(define_insn "neon_vld2_lane<mode>"
+  [(set (match_operand:TI 0 "s_register_operand" "=w")
+        (unspec:TI [(mem:<V_two_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:TI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD2_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[4];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 2);
+  ops[2] = operands[1];
+  ops[3] = operands[3];
+  output_asm_insn ("vld2.<V_sz_elem>\t{%P0[%c3], %P1[%c3]}, [%2]", ops);
+  return "";
+})
+
+(define_insn "neon_vld2_lane<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+        (unspec:OI [(mem:<V_two_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:OI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD2_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[4];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 4);
+  ops[2] = operands[1];
+  ops[3] = GEN_INT (lane);
+  output_asm_insn ("vld2.<V_sz_elem>\t{%P0[%c3], %P1[%c3]}, [%2]", ops);
+  return "";
+})
+
+(define_insn "neon_vld2_dup<mode>"
+  [(set (match_operand:TI 0 "s_register_operand" "=w")
+        (unspec:TI [(mem:<V_two_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD2_DUP))]
+  "TARGET_NEON"
+{
+  if (GET_MODE_NUNITS (<MODE>mode) > 1)
+    return "vld2.<V_sz_elem>\t{%e0[], %f0[]}, [%1]";
+  else
+    return "vld1.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vst2<mode>"
+  [(set (mem:TI (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:TI [(match_operand:TI 1 "s_register_operand" "w")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST2))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vst1.64\t%h1, [%0]";
+  else
+    return "vst2.<V_sz_elem>\t%h1, [%0]";
+})
+
+(define_insn "neon_vst2<mode>"
+  [(set (mem:OI (match_operand:SI 0 "s_register_operand" "r"))
+	(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
+		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_VST2))]
+  "TARGET_NEON"
+  "vst2.<V_sz_elem>\t%h1, [%0]")
+
+(define_insn "neon_vst2_lane<mode>"
+  [(set (mem:<V_two_elem> (match_operand:SI 0 "s_register_operand" "r"))
+	(unspec:<V_two_elem>
+	  [(match_operand:TI 1 "s_register_operand" "w")
+	   (match_operand:SI 2 "immediate_operand" "i")
+	   (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+	  UNSPEC_VST2_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[4];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 2);
+  ops[3] = operands[2];
+  output_asm_insn ("vst2.<V_sz_elem>\t{%P1[%c3], %P2[%c3]}, [%0]", ops);
+  return "";
+})
+
+(define_insn "neon_vst2_lane<mode>"
+  [(set (mem:<V_two_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:<V_two_elem>
+           [(match_operand:OI 1 "s_register_operand" "w")
+            (match_operand:SI 2 "immediate_operand" "i")
+            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+           UNSPEC_VST2_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[4];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = GEN_INT (lane);
+  output_asm_insn ("vst2.<V_sz_elem>\t{%P1[%c3], %P2[%c3]}, [%0]", ops);
+  return "";
+})
+
+(define_insn "neon_vld3<mode>"
+  [(set (match_operand:EI 0 "s_register_operand" "=w")
+        (unspec:EI [(mem:EI (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vld1.64\t%h0, [%1]";
+  else
+    return "vld3.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_expand "neon_vld3<mode>"
+  [(match_operand:CI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "s_register_operand" "+r")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vld3qa<mode> (operands[0], operands[0],
+                                    operands[1], operands[1]));
+  emit_insn (gen_neon_vld3qb<mode> (operands[0], operands[0],
+                                    operands[1], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_vld3qa<mode>"
+  [(set (match_operand:CI 0 "s_register_operand" "=w")
+        (unspec:CI [(mem:CI (match_operand:SI 3 "s_register_operand" "2"))
+                    (match_operand:CI 1 "s_register_operand" "0")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3A))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (plus:SI (match_dup 3)
+		 (const_int 24)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[0]);
+  rtx ops[4];
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 4);
+  ops[2] = gen_rtx_REG (DImode, regno + 8);
+  ops[3] = operands[2];
+  output_asm_insn ("vld3.<V_sz_elem>\t{%P0, %P1, %P2}, [%3]!", ops);
+  return "";
+})
+
+(define_insn "neon_vld3qb<mode>"
+  [(set (match_operand:CI 0 "s_register_operand" "=w")
+        (unspec:CI [(mem:CI (match_operand:SI 3 "s_register_operand" "2"))
+                    (match_operand:CI 1 "s_register_operand" "0")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3B))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (plus:SI (match_dup 3)
+		 (const_int 24)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[0]);
+  rtx ops[4];
+  ops[0] = gen_rtx_REG (DImode, regno + 2);
+  ops[1] = gen_rtx_REG (DImode, regno + 6);
+  ops[2] = gen_rtx_REG (DImode, regno + 10);
+  ops[3] = operands[2];
+  output_asm_insn ("vld3.<V_sz_elem>\t{%P0, %P1, %P2}, [%3]!", ops);
+  return "";
+})
+
+(define_insn "neon_vld3_lane<mode>"
+  [(set (match_operand:EI 0 "s_register_operand" "=w")
+        (unspec:EI [(mem:<V_three_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:EI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[5];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 2);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = operands[1];
+  ops[4] = operands[3];
+  output_asm_insn ("vld3.<V_sz_elem>\t{%P0[%c4], %P1[%c4], %P2[%c4]}, [%3]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vld3_lane<mode>"
+  [(set (match_operand:CI 0 "s_register_operand" "=w")
+        (unspec:CI [(mem:<V_three_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:CI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[5];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 4);
+  ops[2] = gen_rtx_REG (DImode, regno + 8);
+  ops[3] = operands[1];
+  ops[4] = GEN_INT (lane);
+  output_asm_insn ("vld3.<V_sz_elem>\t{%P0[%c4], %P1[%c4], %P2[%c4]}, [%3]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vld3_dup<mode>"
+  [(set (match_operand:EI 0 "s_register_operand" "=w")
+        (unspec:EI [(mem:<V_three_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD3_DUP))]
+  "TARGET_NEON"
+{
+  if (GET_MODE_NUNITS (<MODE>mode) > 1)
+    {
+      int regno = REGNO (operands[0]);
+      rtx ops[4];
+      ops[0] = gen_rtx_REG (DImode, regno);
+      ops[1] = gen_rtx_REG (DImode, regno + 2);
+      ops[2] = gen_rtx_REG (DImode, regno + 4);
+      ops[3] = operands[1];
+      output_asm_insn ("vld3.<V_sz_elem>\t{%P0[], %P1[], %P2[]}, [%3]", ops);
+      return "";
+    }
+  else
+    return "vld1.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vst3<mode>"
+  [(set (mem:EI (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:EI [(match_operand:EI 1 "s_register_operand" "w")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST3))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vst1.64\t%h1, [%0]";
+  else
+    return "vst3.<V_sz_elem>\t%h1, [%0]";
+})
+
+(define_expand "neon_vst3<mode>"
+  [(match_operand:SI 0 "s_register_operand" "+r")
+   (match_operand:CI 1 "s_register_operand" "w")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vst3qa<mode> (operands[0], operands[0], operands[1]));
+  emit_insn (gen_neon_vst3qb<mode> (operands[0], operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_vst3qa<mode>"
+  [(set (mem:EI (match_operand:SI 1 "s_register_operand" "0"))
+        (unspec:EI [(match_operand:CI 2 "s_register_operand" "w")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST3A))
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+        (plus:SI (match_dup 1)
+		 (const_int 24)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[2]);
+  rtx ops[4];
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = gen_rtx_REG (DImode, regno + 8);
+  output_asm_insn ("vst3.<V_sz_elem>\t{%P1, %P2, %P3}, [%0]!", ops);
+  return "";
+})
+
+(define_insn "neon_vst3qb<mode>"
+  [(set (mem:EI (match_operand:SI 1 "s_register_operand" "0"))
+        (unspec:EI [(match_operand:CI 2 "s_register_operand" "w")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST3B))
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+        (plus:SI (match_dup 1)
+		 (const_int 24)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[2]);
+  rtx ops[4];
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno + 2);
+  ops[2] = gen_rtx_REG (DImode, regno + 6);
+  ops[3] = gen_rtx_REG (DImode, regno + 10);
+  output_asm_insn ("vst3.<V_sz_elem>\t{%P1, %P2, %P3}, [%0]!", ops);
+  return "";
+})
+
+(define_insn "neon_vst3_lane<mode>"
+  [(set (mem:<V_three_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:<V_three_elem>
+           [(match_operand:EI 1 "s_register_operand" "w")
+            (match_operand:SI 2 "immediate_operand" "i")
+            (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+           UNSPEC_VST3_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[5];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 2);
+  ops[3] = gen_rtx_REG (DImode, regno + 4);
+  ops[4] = operands[2];
+  output_asm_insn ("vst3.<V_sz_elem>\t{%P1[%c4], %P2[%c4], %P3[%c4]}, [%0]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vst3_lane<mode>"
+  [(set (mem:<V_three_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:<V_three_elem>
+           [(match_operand:CI 1 "s_register_operand" "w")
+            (match_operand:SI 2 "immediate_operand" "i")
+            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+           UNSPEC_VST3_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[5];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = gen_rtx_REG (DImode, regno + 8);
+  ops[4] = GEN_INT (lane);
+  output_asm_insn ("vst3.<V_sz_elem>\t{%P1[%c4], %P2[%c4], %P3[%c4]}, [%0]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vld4<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+        (unspec:OI [(mem:OI (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vld1.64\t%h0, [%1]";
+  else
+    return "vld4.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_expand "neon_vld4<mode>"
+  [(match_operand:XI 0 "s_register_operand" "=w")
+   (match_operand:SI 1 "s_register_operand" "+r")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vld4qa<mode> (operands[0], operands[0],
+                                    operands[1], operands[1]));
+  emit_insn (gen_neon_vld4qb<mode> (operands[0], operands[0],
+                                    operands[1], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_vld4qa<mode>"
+  [(set (match_operand:XI 0 "s_register_operand" "=w")
+        (unspec:XI [(mem:XI (match_operand:SI 3 "s_register_operand" "2"))
+                    (match_operand:XI 1 "s_register_operand" "0")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4A))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (plus:SI (match_dup 3)
+		 (const_int 32)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[0]);
+  rtx ops[5];
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 4);
+  ops[2] = gen_rtx_REG (DImode, regno + 8);
+  ops[3] = gen_rtx_REG (DImode, regno + 12);
+  ops[4] = operands[2];
+  output_asm_insn ("vld4.<V_sz_elem>\t{%P0, %P1, %P2, %P3}, [%4]!", ops);
+  return "";
+})
+
+(define_insn "neon_vld4qb<mode>"
+  [(set (match_operand:XI 0 "s_register_operand" "=w")
+        (unspec:XI [(mem:XI (match_operand:SI 3 "s_register_operand" "2"))
+                    (match_operand:XI 1 "s_register_operand" "0")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4B))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (plus:SI (match_dup 3)
+		 (const_int 32)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[0]);
+  rtx ops[5];
+  ops[0] = gen_rtx_REG (DImode, regno + 2);
+  ops[1] = gen_rtx_REG (DImode, regno + 6);
+  ops[2] = gen_rtx_REG (DImode, regno + 10);
+  ops[3] = gen_rtx_REG (DImode, regno + 14);
+  ops[4] = operands[2];
+  output_asm_insn ("vld4.<V_sz_elem>\t{%P0, %P1, %P2, %P3}, [%4]!", ops);
+  return "";
+})
+
+(define_insn "neon_vld4_lane<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+        (unspec:OI [(mem:<V_four_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:OI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[6];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 2);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = gen_rtx_REG (DImode, regno + 6);
+  ops[4] = operands[1];
+  ops[5] = operands[3];
+  output_asm_insn ("vld4.<V_sz_elem>\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, [%4]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vld4_lane<mode>"
+  [(set (match_operand:XI 0 "s_register_operand" "=w")
+        (unspec:XI [(mem:<V_four_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (match_operand:XI 2 "s_register_operand" "0")
+                    (match_operand:SI 3 "immediate_operand" "i")
+                    (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[3]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[0]);
+  rtx ops[6];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = gen_rtx_REG (DImode, regno);
+  ops[1] = gen_rtx_REG (DImode, regno + 4);
+  ops[2] = gen_rtx_REG (DImode, regno + 8);
+  ops[3] = gen_rtx_REG (DImode, regno + 12);
+  ops[4] = operands[1];
+  ops[5] = GEN_INT (lane);
+  output_asm_insn ("vld4.<V_sz_elem>\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, [%4]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vld4_dup<mode>"
+  [(set (match_operand:OI 0 "s_register_operand" "=w")
+        (unspec:OI [(mem:<V_four_elem> (match_operand:SI 1 "s_register_operand" "r"))
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VLD4_DUP))]
+  "TARGET_NEON"
+{
+  if (GET_MODE_NUNITS (<MODE>mode) > 1)
+    {
+      int regno = REGNO (operands[0]);
+      rtx ops[5];
+      ops[0] = gen_rtx_REG (DImode, regno);
+      ops[1] = gen_rtx_REG (DImode, regno + 2);
+      ops[2] = gen_rtx_REG (DImode, regno + 4);
+      ops[3] = gen_rtx_REG (DImode, regno + 6);
+      ops[4] = operands[1];
+      output_asm_insn ("vld4.<V_sz_elem>\t{%P0[], %P1[], %P2[], %P3[]}, [%4]",
+                       ops);
+      return "";
+    }
+  else
+    return "vld1.<V_sz_elem>\t%h0, [%1]";
+})
+
+(define_insn "neon_vst4<mode>"
+  [(set (mem:OI (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST4))]
+  "TARGET_NEON"
+{
+  if (<V_sz_elem> == 64)
+    return "vst1.64\t%h1, [%0]";
+  else
+    return "vst4.<V_sz_elem>\t%h1, [%0]";
+})
+
+(define_expand "neon_vst4<mode>"
+  [(match_operand:SI 0 "s_register_operand" "+r")
+   (match_operand:XI 1 "s_register_operand" "w")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vst4qa<mode> (operands[0], operands[0], operands[1]));
+  emit_insn (gen_neon_vst4qb<mode> (operands[0], operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "neon_vst4qa<mode>"
+  [(set (mem:OI (match_operand:SI 1 "s_register_operand" "0"))
+        (unspec:OI [(match_operand:XI 2 "s_register_operand" "w")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST4A))
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+        (plus:SI (match_dup 1)
+		 (const_int 32)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[2]);
+  rtx ops[5];
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = gen_rtx_REG (DImode, regno + 8);
+  ops[4] = gen_rtx_REG (DImode, regno + 12);
+  output_asm_insn ("vst4.<V_sz_elem>\t{%P1, %P2, %P3, %P4}, [%0]!", ops);
+  return "";
+})
+
+(define_insn "neon_vst4qb<mode>"
+  [(set (mem:OI (match_operand:SI 1 "s_register_operand" "0"))
+        (unspec:OI [(match_operand:XI 2 "s_register_operand" "w")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST4B))
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+        (plus:SI (match_dup 1)
+		 (const_int 32)))]
+  "TARGET_NEON"
+{
+  int regno = REGNO (operands[2]);
+  rtx ops[5];
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno + 2);
+  ops[2] = gen_rtx_REG (DImode, regno + 6);
+  ops[3] = gen_rtx_REG (DImode, regno + 10);
+  ops[4] = gen_rtx_REG (DImode, regno + 14);
+  output_asm_insn ("vst4.<V_sz_elem>\t{%P1, %P2, %P3, %P4}, [%0]!", ops);
+  return "";
+})
+
+(define_insn "neon_vst4_lane<mode>"
+  [(set (mem:<V_four_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:<V_four_elem>
+           [(match_operand:OI 1 "s_register_operand" "w")
+            (match_operand:SI 2 "immediate_operand" "i")
+            (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+           UNSPEC_VST4_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[6];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 2);
+  ops[3] = gen_rtx_REG (DImode, regno + 4);
+  ops[4] = gen_rtx_REG (DImode, regno + 6);
+  ops[5] = operands[2];
+  output_asm_insn ("vst4.<V_sz_elem>\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, [%0]",
+                   ops);
+  return "";
+})
+
+(define_insn "neon_vst4_lane<mode>"
+  [(set (mem:<V_four_elem> (match_operand:SI 0 "s_register_operand" "r"))
+        (unspec:<V_four_elem>
+           [(match_operand:XI 1 "s_register_operand" "w")
+            (match_operand:SI 2 "immediate_operand" "i")
+            (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+           UNSPEC_VST4_LANE))]
+  "TARGET_NEON"
+{
+  HOST_WIDE_INT lane = INTVAL (operands[2]);
+  HOST_WIDE_INT max = GET_MODE_NUNITS (<MODE>mode);
+  int regno = REGNO (operands[1]);
+  rtx ops[6];
+  if (lane < 0 || lane >= max)
+    error ("lane out of range");
+  else if (lane >= max / 2)
+    {
+      lane -= max / 2;
+      regno += 2;
+    }
+  ops[0] = operands[0];
+  ops[1] = gen_rtx_REG (DImode, regno);
+  ops[2] = gen_rtx_REG (DImode, regno + 4);
+  ops[3] = gen_rtx_REG (DImode, regno + 8);
+  ops[4] = gen_rtx_REG (DImode, regno + 12);
+  ops[5] = GEN_INT (lane);
+  output_asm_insn ("vst4.<V_sz_elem>\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, [%0]",
+                   ops);
+  return "";
+})
+
+(define_expand "neon_vand<mode>"
+  [(match_operand:VDQX 0 "s_register_operand" "")
+   (match_operand:VDQX 1 "s_register_operand" "")
+   (match_operand:VDQX 2 "neon_inv_logic_op2" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_and<mode>3<V_suf64> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vorr<mode>"
+  [(match_operand:VDQX 0 "s_register_operand" "")
+   (match_operand:VDQX 1 "s_register_operand" "")
+   (match_operand:VDQX 2 "neon_logic_op2" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_ior<mode>3<V_suf64> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_veor<mode>"
+  [(match_operand:VDQX 0 "s_register_operand" "")
+   (match_operand:VDQX 1 "s_register_operand" "")
+   (match_operand:VDQX 2 "s_register_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_xor<mode>3<V_suf64> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vbic<mode>"
+  [(match_operand:VDQX 0 "s_register_operand" "")
+   (match_operand:VDQX 1 "s_register_operand" "")
+   (match_operand:VDQX 2 "neon_logic_op2" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_bic<mode>3_neon (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "neon_vorn<mode>"
+  [(match_operand:VDQX 0 "s_register_operand" "")
+   (match_operand:VDQX 1 "s_register_operand" "")
+   (match_operand:VDQX 2 "neon_inv_logic_op2" "")
+   (match_operand:SI 3 "immediate_operand" "")]
+  "TARGET_NEON"
+{
+  emit_insn (gen_orn<mode>3_neon (operands[0], operands[1], operands[2]));
+  DONE;
+})
diff --git a/gcc/config/arm/neon.ml b/gcc/config/arm/neon.ml
new file mode 100644
index 0000000..59f6cc9
--- /dev/null
+++ b/gcc/config/arm/neon.ml
@@ -0,0 +1,1826 @@
+(* Common code for ARM NEON header file, documentation and test case
+   generators.
+
+   Copyright (C) 2006 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 2, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING.  If not, write to the Free
+   Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+   02110-1301, USA.  *)
+
+(* Shorthand types for vector elements.  *)
+type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+          | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
+          | Cast of elts * elts | NoElts
+
+type eltclass = Signed | Unsigned | Float | Poly | Int | Bits
+	      | ConvClass of eltclass * eltclass | NoType
+
+(* These vector types correspond directly to C types.  *)
+type vectype = T_int8x8    | T_int8x16
+             | T_int16x4   | T_int16x8
+	     | T_int32x2   | T_int32x4
+	     | T_int64x1   | T_int64x2
+	     | T_uint8x8   | T_uint8x16
+	     | T_uint16x4  | T_uint16x8
+	     | T_uint32x2  | T_uint32x4
+	     | T_uint64x1  | T_uint64x2
+	     | T_float32x2 | T_float32x4
+	     | T_poly8x8   | T_poly8x16
+	     | T_poly16x4  | T_poly16x8
+	     | T_immediate of int * int
+             | T_int8      | T_int16
+             | T_int32     | T_int64
+             | T_uint8     | T_uint16
+             | T_uint32    | T_uint64
+             | T_poly8     | T_poly16
+             | T_float32   | T_arrayof of int * vectype
+             | T_ptrto of vectype | T_const of vectype
+             | T_void      | T_intQI
+             | T_intHI     | T_intSI
+             | T_intDI
+
+(* The meanings of the following are:
+     TImode : "Tetra", two registers (four words).
+     EImode : "hExa", three registers (six words).
+     OImode : "Octa", four registers (eight words).
+     CImode : "dodeCa", six registers (twelve words).
+     XImode : "heXadeca", eight registers (sixteen words).
+*)
+
+type inttype = B_TImode | B_EImode | B_OImode | B_CImode | B_XImode
+
+type shape_elt = Dreg | Qreg | Corereg | Immed | VecArray of int * shape_elt
+               | PtrTo of shape_elt | CstPtrTo of shape_elt
+	       (* These next ones are used only in the test generator.  *)
+	       | Element_of_dreg	(* Used for "lane" variants.  *)
+	       | Element_of_qreg	(* Likewise.  *)
+	       | All_elements_of_dreg	(* Used for "dup" variants.  *)
+
+type shape_form = All of int * shape_elt
+                | Long
+		| Long_noreg of shape_elt
+		| Wide
+		| Wide_noreg of shape_elt
+		| Narrow
+                | Long_imm
+                | Narrow_imm
+                | Binary_imm of shape_elt
+                | Use_operands of shape_elt array
+                | By_scalar of shape_elt
+                | Unary_scalar of shape_elt
+                | Wide_lane
+                | Wide_scalar
+                | Pair_result of shape_elt
+
+type arity = Arity0 of vectype
+           | Arity1 of vectype * vectype
+	   | Arity2 of vectype * vectype * vectype
+	   | Arity3 of vectype * vectype * vectype * vectype
+           | Arity4 of vectype * vectype * vectype * vectype * vectype
+
+type vecmode = V8QI | V4HI | V2SI | V2SF | DI
+             | V16QI | V8HI | V4SI | V4SF | V2DI
+             | QI | HI | SI | SF
+
+type opcode =
+  (* Binary ops.  *)
+    Vadd
+  | Vmul
+  | Vmla
+  | Vmls
+  | Vsub
+  | Vceq
+  | Vcge
+  | Vcgt
+  | Vcle
+  | Vclt
+  | Vcage
+  | Vcagt
+  | Vcale
+  | Vcalt
+  | Vtst
+  | Vabd
+  | Vaba
+  | Vmax
+  | Vmin
+  | Vpadd
+  | Vpada
+  | Vpmax
+  | Vpmin
+  | Vrecps
+  | Vrsqrts
+  | Vshl
+  | Vshr_n
+  | Vshl_n
+  | Vsra_n
+  | Vsri
+  | Vsli
+  (* Logic binops.  *)
+  | Vand
+  | Vorr
+  | Veor
+  | Vbic
+  | Vorn
+  | Vbsl
+  (* Ops with scalar.  *)
+  | Vmul_lane
+  | Vmla_lane
+  | Vmls_lane
+  | Vmul_n
+  | Vmla_n
+  | Vmls_n
+  | Vmull_n
+  | Vmull_lane
+  | Vqdmull_n
+  | Vqdmull_lane
+  | Vqdmulh_n
+  | Vqdmulh_lane
+  (* Unary ops.  *)
+  | Vabs
+  | Vneg
+  | Vcls
+  | Vclz
+  | Vcnt
+  | Vrecpe
+  | Vrsqrte
+  | Vmvn
+  (* Vector extract.  *)
+  | Vext
+  (* Reverse elements.  *)
+  | Vrev64
+  | Vrev32
+  | Vrev16
+  (* Transposition ops.  *)
+  | Vtrn
+  | Vzip
+  | Vuzp
+  (* Loads and stores (VLD1/VST1/VLD2...), elements and structures.  *)
+  | Vldx of int
+  | Vstx of int
+  | Vldx_lane of int
+  | Vldx_dup of int
+  | Vstx_lane of int
+  (* Set/extract lanes from a vector.  *)
+  | Vget_lane
+  | Vset_lane
+  (* Initialise vector from bit pattern.  *)
+  | Vcreate
+  (* Set all lanes to same value.  *)
+  | Vdup_n
+  | Vmov_n  (* Is this the same?  *)
+  (* Duplicate scalar to all lanes of vector.  *)
+  | Vdup_lane
+  (* Combine vectors.  *)
+  | Vcombine
+  (* Get quadword high/low parts.  *)
+  | Vget_high
+  | Vget_low
+  (* Convert vectors.  *)
+  | Vcvt
+  | Vcvt_n
+  (* Narrow/lengthen vectors.  *)
+  | Vmovn
+  | Vmovl
+  (* Table lookup.  *)
+  | Vtbl of int
+  | Vtbx of int
+  (* Reinterpret casts.  *)
+  | Vreinterp
+
+(* Features used for documentation, to distinguish between some instruction
+   variants, and to signal special requirements (e.g. swapping arguments).  *)
+
+type features =
+    Halving
+  | Rounding
+  | Saturating
+  | Dst_unsign
+  | High_half
+  | Doubling
+  | Flipped of string  (* Builtin name to use with flipped arguments.  *)
+  | InfoWord  (* Pass an extra word for signage/rounding etc. (always passed
+                 for All _, Long, Wide, Narrow shape_forms.  *)
+  | ReturnPtr  (* Pass explicit pointer to return value as first argument.  *)
+    (* A specification as to the shape of instruction expected upon
+       disassembly, used if it differs from the shape used to build the
+       intrinsic prototype.  Multiple entries in the constructor's argument
+       indicate that the intrinsic expands to more than one assembly
+       instruction, each with a corresponding shape specified here.  *)
+  | Disassembles_as of shape_form list
+  | Builtin_name of string  (* Override the name of the builtin.  *)
+    (* Override the name of the instruction.  If more than one name
+       is specified, it means that the instruction can have any of those
+       names.  *)
+  | Instruction_name of string list
+    (* Mark that the intrinsic yields no instructions, or expands to yield
+       behaviour that the test generator cannot test.  *)
+  | No_op
+    (* Mark that the intrinsic has constant arguments that cannot be set
+       to the defaults (zero for pointers and one otherwise) in the test
+       cases.  The function supplied must return the integer to be written
+       into the testcase for the argument number (0-based) supplied to it.  *)
+  | Const_valuator of (int -> int)
+
+exception MixedMode of elts * elts
+
+let rec elt_width = function
+    S8 | U8 | P8 | I8 | B8 -> 8
+  | S16 | U16 | P16 | I16 | B16 -> 16
+  | S32 | F32 | U32 | I32 | B32 -> 32
+  | S64 | U64 | I64 | B64 -> 64
+  | Conv (a, b) ->
+      let wa = elt_width a and wb = elt_width b in
+      if wa = wb then wa else failwith "element width?"
+  | Cast (a, b) -> raise (MixedMode (a, b))
+  | NoElts -> failwith "No elts"
+
+let rec elt_class = function
+    S8 | S16 | S32 | S64 -> Signed
+  | U8 | U16 | U32 | U64 -> Unsigned
+  | P8 | P16 -> Poly
+  | F32 -> Float
+  | I8 | I16 | I32 | I64 -> Int
+  | B8 | B16 | B32 | B64 -> Bits
+  | Conv (a, b) | Cast (a, b) -> ConvClass (elt_class a, elt_class b)
+  | NoElts -> NoType
+
+let elt_of_class_width c w =
+  match c, w with
+    Signed, 8 -> S8
+  | Signed, 16 -> S16
+  | Signed, 32 -> S32
+  | Signed, 64 -> S64
+  | Float, 32 -> F32
+  | Unsigned, 8 -> U8
+  | Unsigned, 16 -> U16
+  | Unsigned, 32 -> U32
+  | Unsigned, 64 -> U64
+  | Poly, 8 -> P8
+  | Poly, 16 -> P16
+  | Int, 8 -> I8
+  | Int, 16 -> I16
+  | Int, 32 -> I32
+  | Int, 64 -> I64
+  | Bits, 8 -> B8
+  | Bits, 16 -> B16
+  | Bits, 32 -> B32
+  | Bits, 64 -> B64
+  | _ -> failwith "Bad element type"
+
+(* Return unsigned integer element the same width as argument.  *)
+let unsigned_of_elt elt =
+  elt_of_class_width Unsigned (elt_width elt)
+
+let signed_of_elt elt =
+  elt_of_class_width Signed (elt_width elt)
+
+(* Return untyped bits element the same width as argument.  *)
+let bits_of_elt elt =
+  elt_of_class_width Bits (elt_width elt)
+
+let non_signed_variant = function
+    S8 -> I8
+  | S16 -> I16
+  | S32 -> I32
+  | S64 -> I64
+  | U8 -> I8
+  | U16 -> I16
+  | U32 -> I32
+  | U64 -> I64
+  | x -> x
+
+let poly_unsigned_variant v =
+  let elclass = match elt_class v with
+    Poly -> Unsigned
+  | x -> x in
+  elt_of_class_width elclass (elt_width v)
+
+let widen_elt elt =
+  let w = elt_width elt
+  and c = elt_class elt in
+  elt_of_class_width c (w * 2)
+
+let narrow_elt elt =
+  let w = elt_width elt
+  and c = elt_class elt in
+  elt_of_class_width c (w / 2)
+
+(* If we're trying to find a mode from a "Use_operands" instruction, use the
+   last vector operand as the dominant mode used to invoke the correct builtin.
+   We must stick to this rule in neon.md.  *)
+let find_key_operand operands =
+  let rec scan opno =
+    match operands.(opno) with
+      Qreg -> Qreg
+    | Dreg -> Dreg
+    | VecArray (_, Qreg) -> Qreg
+    | VecArray (_, Dreg) -> Dreg
+    | _ -> scan (opno-1)
+  in
+    scan ((Array.length operands) - 1)
+
+let rec mode_of_elt elt shape =
+  let flt = match elt_class elt with
+    Float | ConvClass(_, Float) -> true | _ -> false in
+  let idx =
+    match elt_width elt with
+      8 -> 0 | 16 -> 1 | 32 -> 2 | 64 -> 3
+    | _ -> failwith "Bad element width"
+  in match shape with
+    All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg
+  | Binary_imm Dreg | Long_noreg Dreg | Wide_noreg Dreg ->
+      [| V8QI; V4HI; if flt then V2SF else V2SI; DI |].(idx)
+  | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg
+  | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg ->
+      [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI |].(idx)
+  | All (_, (Corereg | PtrTo _ | CstPtrTo _)) ->
+      [| QI; HI; if flt then SF else SI; DI |].(idx)
+  | Long | Wide | Wide_lane | Wide_scalar
+  | Long_imm ->
+      [| V8QI; V4HI; V2SI; DI |].(idx)
+  | Narrow | Narrow_imm -> [| V16QI; V8HI; V4SI; V2DI |].(idx)
+  | Use_operands ops -> mode_of_elt elt (All (0, (find_key_operand ops)))
+  | _ -> failwith "invalid shape"
+
+(* Modify an element type dependent on the shape of the instruction and the
+   operand number.  *)
+
+let shapemap shape no =
+  let ident = fun x -> x in
+  match shape with
+    All _ | Use_operands _ | By_scalar _ | Pair_result _ | Unary_scalar _
+  | Binary_imm _ -> ident
+  | Long | Long_noreg _ | Wide_scalar | Long_imm ->
+      [| widen_elt; ident; ident |].(no)
+  | Wide | Wide_noreg _ -> [| widen_elt; widen_elt; ident |].(no)
+  | Wide_lane -> [| widen_elt; ident; ident; ident |].(no)
+  | Narrow | Narrow_imm -> [| narrow_elt; ident; ident |].(no)
+
+(* Register type (D/Q) of an operand, based on shape and operand number.  *)
+
+let regmap shape no =
+  match shape with
+    All (_, reg) | Long_noreg reg | Wide_noreg reg -> reg
+  | Long -> [| Qreg; Dreg; Dreg |].(no)
+  | Wide -> [| Qreg; Qreg; Dreg |].(no)
+  | Narrow -> [| Dreg; Qreg; Qreg |].(no)
+  | Wide_lane -> [| Qreg; Dreg; Dreg; Immed |].(no)
+  | Wide_scalar -> [| Qreg; Dreg; Corereg |].(no)
+  | By_scalar reg -> [| reg; reg; Dreg; Immed |].(no)
+  | Unary_scalar reg -> [| reg; Dreg; Immed |].(no)
+  | Pair_result reg -> [| VecArray (2, reg); reg; reg |].(no)
+  | Binary_imm reg -> [| reg; reg; Immed |].(no)
+  | Long_imm -> [| Qreg; Dreg; Immed |].(no)
+  | Narrow_imm -> [| Dreg; Qreg; Immed |].(no)
+  | Use_operands these -> these.(no)
+
+let type_for_elt shape elt no =
+  let elt = (shapemap shape no) elt in
+  let reg = regmap shape no in
+  let rec type_for_reg_elt reg elt =
+    match reg with
+      Dreg ->
+        begin match elt with
+          S8 -> T_int8x8
+        | S16 -> T_int16x4
+        | S32 -> T_int32x2
+        | S64 -> T_int64x1
+        | U8 -> T_uint8x8
+        | U16 -> T_uint16x4
+        | U32 -> T_uint32x2
+        | U64 -> T_uint64x1
+        | F32 -> T_float32x2
+        | P8 -> T_poly8x8
+        | P16 -> T_poly16x4
+        | _ -> failwith "Bad elt type"
+        end
+    | Qreg ->
+        begin match elt with
+          S8 -> T_int8x16
+        | S16 -> T_int16x8
+        | S32 -> T_int32x4
+        | S64 -> T_int64x2
+        | U8 -> T_uint8x16
+        | U16 -> T_uint16x8
+        | U32 -> T_uint32x4
+        | U64 -> T_uint64x2
+        | F32 -> T_float32x4
+        | P8 -> T_poly8x16
+        | P16 -> T_poly16x8
+        | _ -> failwith "Bad elt type"
+        end
+    | Corereg ->
+        begin match elt with
+          S8 -> T_int8
+        | S16 -> T_int16
+        | S32 -> T_int32
+        | S64 -> T_int64
+        | U8 -> T_uint8
+        | U16 -> T_uint16
+        | U32 -> T_uint32
+        | U64 -> T_uint64
+        | P8 -> T_poly8
+        | P16 -> T_poly16
+        | F32 -> T_float32
+        | _ -> failwith "Bad elt type"
+        end
+    | Immed ->
+        T_immediate (0, 0)
+    | VecArray (num, sub) ->
+        T_arrayof (num, type_for_reg_elt sub elt)
+    | PtrTo x ->
+        T_ptrto (type_for_reg_elt x elt)
+    | CstPtrTo x ->
+        T_ptrto (T_const (type_for_reg_elt x elt))
+    (* Anything else is solely for the use of the test generator.  *)
+    | _ -> assert false
+  in
+    type_for_reg_elt reg elt
+
+(* Return size of a vector type, in bits.  *)
+let vectype_size = function
+    T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1
+  | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1
+  | T_float32x2 | T_poly8x8 | T_poly16x4 -> 64
+  | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2
+  | T_uint8x16 | T_uint16x8  | T_uint32x4  | T_uint64x2
+  | T_float32x4 | T_poly8x16 | T_poly16x8 -> 128
+  | _ -> raise Not_found
+
+let inttype_for_array num elttype =
+  let eltsize = vectype_size elttype in
+  let numwords = (num * eltsize) / 32 in
+  match numwords with
+    4 -> B_TImode
+  | 6 -> B_EImode
+  | 8 -> B_OImode
+  | 12 -> B_CImode
+  | 16 -> B_XImode
+  | _ -> failwith ("no int type for size " ^ string_of_int numwords)
+
+(* These functions return pairs of (internal, external) types, where "internal"
+   types are those seen by GCC, and "external" are those seen by the assembler.
+   These types aren't necessarily the same, since the intrinsics can munge more
+   than one C type into each assembler opcode.  *)
+
+let make_sign_invariant func shape elt =
+  let arity, elt' = func shape elt in
+  arity, non_signed_variant elt'
+
+(* Don't restrict any types.  *)
+
+let elts_same make_arity shape elt =
+  let vtype = type_for_elt shape elt in
+  make_arity vtype, elt
+
+(* As sign_invar_*, but when sign matters.  *)
+let elts_same_io_lane =
+  elts_same (fun vtype -> Arity4 (vtype 0, vtype 0, vtype 1, vtype 2, vtype 3))
+
+let elts_same_io =
+  elts_same (fun vtype -> Arity3 (vtype 0, vtype 0, vtype 1, vtype 2))
+
+let elts_same_2_lane =
+  elts_same (fun vtype -> Arity3 (vtype 0, vtype 1, vtype 2, vtype 3))
+
+let elts_same_3 = elts_same_2_lane
+
+let elts_same_2 =
+  elts_same (fun vtype -> Arity2 (vtype 0, vtype 1, vtype 2))
+
+let elts_same_1 =
+  elts_same (fun vtype -> Arity1 (vtype 0, vtype 1))
+
+(* Use for signed/unsigned invariant operations (i.e. where the operation
+   doesn't depend on the sign of the data.  *)
+
+let sign_invar_io_lane = make_sign_invariant elts_same_io_lane
+let sign_invar_io = make_sign_invariant elts_same_io
+let sign_invar_2_lane = make_sign_invariant elts_same_2_lane
+let sign_invar_2 = make_sign_invariant elts_same_2
+let sign_invar_1 = make_sign_invariant elts_same_1
+
+(* Sign-sensitive comparison.  *)
+
+let cmp_sign_matters shape elt =
+  let vtype = type_for_elt shape elt
+  and rtype = type_for_elt shape (unsigned_of_elt elt) 0 in
+  Arity2 (rtype, vtype 1, vtype 2), elt
+
+(* Signed/unsigned invariant comparison.  *)
+
+let cmp_sign_invar shape elt =
+  let shape', elt' = cmp_sign_matters shape elt in
+  let elt'' =
+    match non_signed_variant elt' with
+      P8 -> I8
+    | x -> x
+  in
+    shape', elt''
+
+(* Comparison (VTST) where only the element width matters.  *)
+
+let cmp_bits shape elt =
+  let vtype = type_for_elt shape elt
+  and rtype = type_for_elt shape (unsigned_of_elt elt) 0
+  and bits_only = bits_of_elt elt in
+  Arity2 (rtype, vtype 1, vtype 2), bits_only
+
+let reg_shift shape elt =
+  let vtype = type_for_elt shape elt
+  and op2type = type_for_elt shape (signed_of_elt elt) 2 in
+  Arity2 (vtype 0, vtype 1, op2type), elt
+
+(* Genericised constant-shift type-generating function.  *)
+
+let const_shift mkimm ?arity ?result shape elt =
+  let op2type = (shapemap shape 2) elt in
+  let op2width = elt_width op2type in
+  let op2 = mkimm op2width
+  and op1 = type_for_elt shape elt 1
+  and r_elt =
+    match result with
+      None -> elt
+    | Some restriction -> restriction elt in
+  let rtype = type_for_elt shape r_elt 0 in
+  match arity with
+    None -> Arity2 (rtype, op1, op2), elt
+  | Some mkarity -> mkarity rtype op1 op2, elt
+
+(* Use for immediate right-shifts.  *)
+
+let shift_right shape elt =
+  const_shift (fun imm -> T_immediate (1, imm)) shape elt
+
+let shift_right_acc shape elt =
+  const_shift (fun imm -> T_immediate (1, imm))
+    ~arity:(fun dst op1 op2 -> Arity3 (dst, dst, op1, op2)) shape elt
+
+(* Use for immediate right-shifts when the operation doesn't care about
+   signedness.  *)
+
+let shift_right_sign_invar =
+  make_sign_invariant shift_right
+
+(* Immediate right-shift; result is unsigned even when operand is signed.  *)
+
+let shift_right_to_uns shape elt =
+  const_shift (fun imm -> T_immediate (1, imm)) ~result:unsigned_of_elt
+    shape elt
+
+(* Immediate left-shift.  *)
+
+let shift_left shape elt =
+  const_shift (fun imm -> T_immediate (0, imm - 1)) shape elt
+
+(* Immediate left-shift, unsigned result.  *)
+
+let shift_left_to_uns shape elt =
+  const_shift (fun imm -> T_immediate (0, imm - 1)) ~result:unsigned_of_elt
+    shape elt
+
+(* Immediate left-shift, don't care about signs.  *)
+
+let shift_left_sign_invar =
+  make_sign_invariant shift_left
+
+(* Shift left/right and insert: only element size matters.  *)
+
+let shift_insert shape elt =
+  let arity, elt =
+    const_shift (fun imm -> T_immediate (1, imm))
+    ~arity:(fun dst op1 op2 -> Arity3 (dst, dst, op1, op2)) shape elt in
+  arity, bits_of_elt elt
+
+(* Get/set lane.  *)
+
+let get_lane shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity2 (vtype 0, vtype 1, vtype 2),
+    (match elt with P8 -> U8 | P16 -> U16 | x -> x)
+
+let set_lane shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), bits_of_elt elt
+
+let set_lane_notype shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), NoElts
+
+let create_vector shape elt =
+  let vtype = type_for_elt shape U64 1
+  and rtype = type_for_elt shape elt 0 in
+  Arity1 (rtype, vtype), elt
+
+let conv make_arity shape elt =
+  let edest, esrc = match elt with
+    Conv (edest, esrc) | Cast (edest, esrc) -> edest, esrc
+  | _ -> failwith "Non-conversion element in conversion" in
+  let vtype = type_for_elt shape esrc
+  and rtype = type_for_elt shape edest 0 in
+  make_arity rtype vtype, elt
+
+let conv_1 = conv (fun rtype vtype -> Arity1 (rtype, vtype 1))
+let conv_2 = conv (fun rtype vtype -> Arity2 (rtype, vtype 1, vtype 2))
+
+(* Operation has an unsigned result even if operands are signed.  *)
+
+let dst_unsign make_arity shape elt =
+  let vtype = type_for_elt shape elt
+  and rtype = type_for_elt shape (unsigned_of_elt elt) 0 in
+  make_arity rtype vtype, elt
+
+let dst_unsign_1 = dst_unsign (fun rtype vtype -> Arity1 (rtype, vtype 1))
+
+let make_bits_only func shape elt =
+  let arity, elt' = func shape elt in
+  arity, bits_of_elt elt'
+
+(* Extend operation.  *)
+
+let extend shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), bits_of_elt elt
+
+(* Table look-up operations. Operand 2 is signed/unsigned for signed/unsigned
+   integer ops respectively, or unsigned for polynomial ops.  *)
+
+let table mkarity shape elt =
+  let vtype = type_for_elt shape elt in
+  let op2 = type_for_elt shape (poly_unsigned_variant elt) 2 in
+  mkarity vtype op2, bits_of_elt elt
+
+let table_2 = table (fun vtype op2 -> Arity2 (vtype 0, vtype 1, op2))
+let table_io = table (fun vtype op2 -> Arity3 (vtype 0, vtype 0, vtype 1, op2))
+
+(* Operations where only bits matter.  *)
+
+let bits_1 = make_bits_only elts_same_1
+let bits_2 = make_bits_only elts_same_2
+let bits_3 = make_bits_only elts_same_3
+
+(* Store insns.  *)
+let store_1 shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity2 (T_void, vtype 0, vtype 1), bits_of_elt elt
+
+let store_3 shape elt =
+  let vtype = type_for_elt shape elt in
+  Arity3 (T_void, vtype 0, vtype 1, vtype 2), bits_of_elt elt
+
+let make_notype func shape elt =
+  let arity, _ = func shape elt in
+  arity, NoElts
+
+let notype_1 = make_notype elts_same_1
+let notype_2 = make_notype elts_same_2
+let notype_3 = make_notype elts_same_3
+
+(* Bit-select operations (first operand is unsigned int).  *)
+
+let bit_select shape elt =
+  let vtype = type_for_elt shape elt
+  and itype = type_for_elt shape (unsigned_of_elt elt) in
+  Arity3 (vtype 0, itype 1, vtype 2, vtype 3), NoElts
+
+(* Common lists of supported element types.  *)
+
+let su_8_32 = [S8; S16; S32; U8; U16; U32]
+let su_8_64 = S64 :: U64 :: su_8_32
+let su_16_64 = [S16; S32; S64; U16; U32; U64]
+let pf_su_8_32 = P8 :: P16 :: F32 :: su_8_32
+let pf_su_8_64 = P8 :: P16 :: F32 :: su_8_64
+
+let ops =
+  [
+    (* Addition.  *)
+    Vadd, [], All (3, Dreg), "vadd", sign_invar_2, F32 :: su_8_64;
+    Vadd, [], All (3, Qreg), "vaddQ", sign_invar_2, F32 :: su_8_64;
+    Vadd, [], Long, "vaddl", elts_same_2, su_8_32;
+    Vadd, [], Wide, "vaddw", elts_same_2, su_8_32;
+    Vadd, [Halving], All (3, Dreg), "vhadd", elts_same_2, su_8_32;
+    Vadd, [Halving], All (3, Qreg), "vhaddQ", elts_same_2, su_8_32;
+    Vadd, [Instruction_name ["vrhadd"]; Rounding; Halving],
+      All (3, Dreg), "vRhadd", elts_same_2, su_8_32;
+    Vadd, [Instruction_name ["vrhadd"]; Rounding; Halving],
+      All (3, Qreg), "vRhaddQ", elts_same_2, su_8_32;
+    Vadd, [Saturating], All (3, Dreg), "vqadd", elts_same_2, su_8_64;
+    Vadd, [Saturating], All (3, Qreg), "vqaddQ", elts_same_2, su_8_64;
+    Vadd, [High_half], Narrow, "vaddhn", sign_invar_2, su_16_64;
+    Vadd, [Instruction_name ["vraddhn"]; Rounding; High_half],
+      Narrow, "vRaddhn", sign_invar_2, su_16_64;
+
+    (* Multiplication.  *)
+    Vmul, [], All (3, Dreg), "vmul", sign_invar_2, P8 :: F32 :: su_8_32;
+    Vmul, [], All (3, Qreg), "vmulQ", sign_invar_2, P8 :: F32 :: su_8_32;
+    Vmul, [Saturating; Doubling; High_half], All (3, Dreg), "vqdmulh",
+      elts_same_2, [S16; S32];
+    Vmul, [Saturating; Doubling; High_half], All (3, Qreg), "vqdmulhQ",
+      elts_same_2, [S16; S32];
+    Vmul,
+      [Saturating; Rounding; Doubling; High_half;
+       Instruction_name ["vqrdmulh"]],
+      All (3, Dreg), "vqRdmulh",
+      elts_same_2, [S16; S32];
+    Vmul,
+      [Saturating; Rounding; Doubling; High_half;
+       Instruction_name ["vqrdmulh"]],
+      All (3, Qreg), "vqRdmulhQ",
+      elts_same_2, [S16; S32];
+    Vmul, [], Long, "vmull", elts_same_2, P8 :: su_8_32;
+    Vmul, [Saturating; Doubling], Long, "vqdmull", elts_same_2, [S16; S32];
+
+    (* Multiply-accumulate. *)
+    Vmla, [], All (3, Dreg), "vmla", sign_invar_io, F32 :: su_8_32;
+    Vmla, [], All (3, Qreg), "vmlaQ", sign_invar_io, F32 :: su_8_32;
+    Vmla, [], Long, "vmlal", elts_same_io, su_8_32;
+    Vmla, [Saturating; Doubling], Long, "vqdmlal", elts_same_io, [S16; S32];
+
+    (* Multiply-subtract.  *)
+    Vmls, [], All (3, Dreg), "vmls", sign_invar_io, F32 :: su_8_32;
+    Vmls, [], All (3, Qreg), "vmlsQ", sign_invar_io, F32 :: su_8_32;
+    Vmls, [], Long, "vmlsl", elts_same_io, su_8_32;
+    Vmls, [Saturating; Doubling], Long, "vqdmlsl", elts_same_io, [S16; S32];
+
+    (* Subtraction.  *)
+    Vsub, [], All (3, Dreg), "vsub", sign_invar_2, F32 :: su_8_64;
+    Vsub, [], All (3, Qreg), "vsubQ", sign_invar_2, F32 :: su_8_64;
+    Vsub, [], Long, "vsubl", elts_same_2, su_8_32;
+    Vsub, [], Wide, "vsubw", elts_same_2, su_8_32;
+    Vsub, [Halving], All (3, Dreg), "vhsub", elts_same_2, su_8_32;
+    Vsub, [Halving], All (3, Qreg), "vhsubQ", elts_same_2, su_8_32;
+    Vsub, [Saturating], All (3, Dreg), "vqsub", elts_same_2, su_8_64;
+    Vsub, [Saturating], All (3, Qreg), "vqsubQ", elts_same_2, su_8_64;
+    Vsub, [High_half], Narrow, "vsubhn", sign_invar_2, su_16_64;
+    Vsub, [Instruction_name ["vrsubhn"]; Rounding; High_half],
+      Narrow, "vRsubhn", sign_invar_2, su_16_64;
+
+    (* Comparison, equal.  *)
+    Vceq, [], All (3, Dreg), "vceq", cmp_sign_invar, P8 :: F32 :: su_8_32;
+    Vceq, [], All (3, Qreg), "vceqQ", cmp_sign_invar, P8 :: F32 :: su_8_32;
+
+    (* Comparison, greater-than or equal.  *)
+    Vcge, [], All (3, Dreg), "vcge", cmp_sign_matters, F32 :: su_8_32;
+    Vcge, [], All (3, Qreg), "vcgeQ", cmp_sign_matters, F32 :: su_8_32;
+
+    (* Comparison, less-than or equal.  *)
+    Vcle, [Flipped "vcge"], All (3, Dreg), "vcle", cmp_sign_matters,
+      F32 :: su_8_32;
+    Vcle, [Instruction_name ["vcge"]; Flipped "vcgeQ"],
+      All (3, Qreg), "vcleQ", cmp_sign_matters,
+      F32 :: su_8_32;
+
+    (* Comparison, greater-than.  *)
+    Vcgt, [], All (3, Dreg), "vcgt", cmp_sign_matters, F32 :: su_8_32;
+    Vcgt, [], All (3, Qreg), "vcgtQ", cmp_sign_matters, F32 :: su_8_32;
+
+    (* Comparison, less-than.  *)
+    Vclt, [Flipped "vcgt"], All (3, Dreg), "vclt", cmp_sign_matters,
+      F32 :: su_8_32;
+    Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtQ"],
+      All (3, Qreg), "vcltQ", cmp_sign_matters,
+      F32 :: su_8_32;
+
+    (* Compare absolute greater-than or equal.  *)
+    Vcage, [Instruction_name ["vacge"]],
+      All (3, Dreg), "vcage", cmp_sign_matters, [F32];
+    Vcage, [Instruction_name ["vacge"]],
+      All (3, Qreg), "vcageQ", cmp_sign_matters, [F32];
+
+    (* Compare absolute less-than or equal.  *)
+    Vcale, [Instruction_name ["vacge"]; Flipped "vcage"],
+      All (3, Dreg), "vcale", cmp_sign_matters, [F32];
+    Vcale, [Instruction_name ["vacge"]; Flipped "vcageQ"],
+      All (3, Qreg), "vcaleQ", cmp_sign_matters, [F32];
+
+    (* Compare absolute greater-than or equal.  *)
+    Vcagt, [Instruction_name ["vacgt"]],
+      All (3, Dreg), "vcagt", cmp_sign_matters, [F32];
+    Vcagt, [Instruction_name ["vacgt"]],
+      All (3, Qreg), "vcagtQ", cmp_sign_matters, [F32];
+
+    (* Compare absolute less-than or equal.  *)
+    Vcalt, [Instruction_name ["vacgt"]; Flipped "vcagt"],
+      All (3, Dreg), "vcalt", cmp_sign_matters, [F32];
+    Vcalt, [Instruction_name ["vacgt"]; Flipped "vcagtQ"],
+      All (3, Qreg), "vcaltQ", cmp_sign_matters, [F32];
+
+    (* Test bits.  *)
+    Vtst, [], All (3, Dreg), "vtst", cmp_bits, P8 :: su_8_32;
+    Vtst, [], All (3, Qreg), "vtstQ", cmp_bits, P8 :: su_8_32;
+
+    (* Absolute difference.  *)
+    Vabd, [], All (3, Dreg), "vabd", elts_same_2, F32 :: su_8_32;
+    Vabd, [], All (3, Qreg), "vabdQ", elts_same_2, F32 :: su_8_32;
+    Vabd, [], Long, "vabdl", elts_same_2, su_8_32;
+
+    (* Absolute difference and accumulate.  *)
+    Vaba, [], All (3, Dreg), "vaba", elts_same_io, su_8_32;
+    Vaba, [], All (3, Qreg), "vabaQ", elts_same_io, su_8_32;
+    Vaba, [], Long, "vabal", elts_same_io, su_8_32;
+
+    (* Max.  *)
+    Vmax, [], All (3, Dreg), "vmax", elts_same_2, F32 :: su_8_32;
+    Vmax, [], All (3, Qreg), "vmaxQ", elts_same_2, F32 :: su_8_32;
+
+    (* Min.  *)
+    Vmin, [], All (3, Dreg), "vmin", elts_same_2, F32 :: su_8_32;
+    Vmin, [], All (3, Qreg), "vminQ", elts_same_2, F32 :: su_8_32;
+
+    (* Pairwise add.  *)
+    Vpadd, [], All (3, Dreg), "vpadd", sign_invar_2, F32 :: su_8_32;
+    Vpadd, [], Long_noreg Dreg, "vpaddl", elts_same_1, su_8_32;
+    Vpadd, [], Long_noreg Qreg, "vpaddlQ", elts_same_1, su_8_32;
+
+    (* Pairwise add, widen and accumulate.  *)
+    Vpada, [], Wide_noreg Dreg, "vpadal", elts_same_2, su_8_32;
+    Vpada, [], Wide_noreg Qreg, "vpadalQ", elts_same_2, su_8_32;
+
+    (* Folding maximum, minimum.  *)
+    Vpmax, [], All (3, Dreg), "vpmax", elts_same_2, F32 :: su_8_32;
+    Vpmin, [], All (3, Dreg), "vpmin", elts_same_2, F32 :: su_8_32;
+
+    (* Reciprocal step.  *)
+    Vrecps, [], All (3, Dreg), "vrecps", elts_same_2, [F32];
+    Vrecps, [], All (3, Qreg), "vrecpsQ", elts_same_2, [F32];
+    Vrsqrts, [], All (3, Dreg), "vrsqrts", elts_same_2, [F32];
+    Vrsqrts, [], All (3, Qreg), "vrsqrtsQ", elts_same_2, [F32];
+
+    (* Vector shift left.  *)
+    Vshl, [], All (3, Dreg), "vshl", reg_shift, su_8_64;
+    Vshl, [], All (3, Qreg), "vshlQ", reg_shift, su_8_64;
+    Vshl, [Instruction_name ["vrshl"]; Rounding],
+      All (3, Dreg), "vRshl", reg_shift, su_8_64;
+    Vshl, [Instruction_name ["vrshl"]; Rounding],
+      All (3, Qreg), "vRshlQ", reg_shift, su_8_64;
+    Vshl, [Saturating], All (3, Dreg), "vqshl", reg_shift, su_8_64;
+    Vshl, [Saturating], All (3, Qreg), "vqshlQ", reg_shift, su_8_64;
+    Vshl, [Instruction_name ["vqrshl"]; Saturating; Rounding],
+      All (3, Dreg), "vqRshl", reg_shift, su_8_64;
+    Vshl, [Instruction_name ["vqrshl"]; Saturating; Rounding],
+      All (3, Qreg), "vqRshlQ", reg_shift, su_8_64;
+
+    (* Vector shift right by constant.  *)
+    Vshr_n, [], Binary_imm Dreg, "vshr_n", shift_right, su_8_64;
+    Vshr_n, [], Binary_imm Qreg, "vshrQ_n", shift_right, su_8_64;
+    Vshr_n, [Instruction_name ["vrshr"]; Rounding], Binary_imm Dreg,
+      "vRshr_n", shift_right, su_8_64;
+    Vshr_n, [Instruction_name ["vrshr"]; Rounding], Binary_imm Qreg,
+      "vRshrQ_n", shift_right, su_8_64;
+    Vshr_n, [], Narrow_imm, "vshrn_n", shift_right_sign_invar, su_16_64;
+    Vshr_n, [Instruction_name ["vrshrn"]; Rounding], Narrow_imm, "vRshrn_n",
+      shift_right_sign_invar, su_16_64;
+    Vshr_n, [Saturating], Narrow_imm, "vqshrn_n", shift_right, su_16_64;
+    Vshr_n, [Instruction_name ["vqrshrn"]; Saturating; Rounding], Narrow_imm,
+      "vqRshrn_n", shift_right, su_16_64;
+    Vshr_n, [Saturating; Dst_unsign], Narrow_imm, "vqshrun_n",
+      shift_right_to_uns, [S16; S32; S64];
+    Vshr_n, [Instruction_name ["vqrshrun"]; Saturating; Dst_unsign; Rounding],
+      Narrow_imm, "vqRshrun_n", shift_right_to_uns, [S16; S32; S64];
+
+    (* Vector shift left by constant.  *)
+    Vshl_n, [], Binary_imm Dreg, "vshl_n", shift_left_sign_invar, su_8_64;
+    Vshl_n, [], Binary_imm Qreg, "vshlQ_n", shift_left_sign_invar, su_8_64;
+    Vshl_n, [Saturating], Binary_imm Dreg, "vqshl_n", shift_left, su_8_64;
+    Vshl_n, [Saturating], Binary_imm Qreg, "vqshlQ_n", shift_left, su_8_64;
+    Vshl_n, [Saturating; Dst_unsign], Binary_imm Dreg, "vqshlu_n",
+      shift_left_to_uns, [S8; S16; S32; S64];
+    Vshl_n, [Saturating; Dst_unsign], Binary_imm Qreg, "vqshluQ_n",
+      shift_left_to_uns, [S8; S16; S32; S64];
+    Vshl_n, [], Long_imm, "vshll_n", shift_left, su_8_32;
+
+    (* Vector shift right by constant and accumulate.  *)
+    Vsra_n, [], Binary_imm Dreg, "vsra_n", shift_right_acc, su_8_64;
+    Vsra_n, [], Binary_imm Qreg, "vsraQ_n", shift_right_acc, su_8_64;
+    Vsra_n, [Instruction_name ["vrsra"]; Rounding], Binary_imm Dreg,
+      "vRsra_n", shift_right_acc, su_8_64;
+    Vsra_n, [Instruction_name ["vrsra"]; Rounding], Binary_imm Qreg,
+      "vRsraQ_n", shift_right_acc, su_8_64;
+
+    (* Vector shift right and insert.  *)
+    Vsri, [], Use_operands [| Dreg; Dreg; Immed |], "vsri_n", shift_insert,
+      P8 :: P16 :: su_8_64;
+    Vsri, [], Use_operands [| Qreg; Qreg; Immed |], "vsriQ_n", shift_insert,
+      P8 :: P16 :: su_8_64;
+
+    (* Vector shift left and insert.  *)
+    Vsli, [], Use_operands [| Dreg; Dreg; Immed |], "vsli_n", shift_insert,
+      P8 :: P16 :: su_8_64;
+    Vsli, [], Use_operands [| Qreg; Qreg; Immed |], "vsliQ_n", shift_insert,
+      P8 :: P16 :: su_8_64;
+
+    (* Absolute value.  *)
+    Vabs, [], All (2, Dreg), "vabs", elts_same_1, [S8; S16; S32; F32];
+    Vabs, [], All (2, Qreg), "vabsQ", elts_same_1, [S8; S16; S32; F32];
+    Vabs, [Saturating], All (2, Dreg), "vqabs", elts_same_1, [S8; S16; S32];
+    Vabs, [Saturating], All (2, Qreg), "vqabsQ", elts_same_1, [S8; S16; S32];
+
+    (* Negate.  *)
+    Vneg, [], All (2, Dreg), "vneg", elts_same_1, [S8; S16; S32; F32];
+    Vneg, [], All (2, Qreg), "vnegQ", elts_same_1, [S8; S16; S32; F32];
+    Vneg, [Saturating], All (2, Dreg), "vqneg", elts_same_1, [S8; S16; S32];
+    Vneg, [Saturating], All (2, Qreg), "vqnegQ", elts_same_1, [S8; S16; S32];
+
+    (* Bitwise not.  *)
+    Vmvn, [], All (2, Dreg), "vmvn", notype_1, P8 :: su_8_32;
+    Vmvn, [], All (2, Qreg), "vmvnQ", notype_1, P8 :: su_8_32;
+
+    (* Count leading sign bits.  *)
+    Vcls, [], All (2, Dreg), "vcls", elts_same_1, [S8; S16; S32];
+    Vcls, [], All (2, Qreg), "vclsQ", elts_same_1, [S8; S16; S32];
+
+    (* Count leading zeros.  *)
+    Vclz, [], All (2, Dreg), "vclz", sign_invar_1, su_8_32;
+    Vclz, [], All (2, Qreg), "vclzQ", sign_invar_1, su_8_32;
+
+    (* Count number of set bits.  *)
+    Vcnt, [], All (2, Dreg), "vcnt", bits_1, [P8; S8; U8];
+    Vcnt, [], All (2, Qreg), "vcntQ", bits_1, [P8; S8; U8];
+
+    (* Reciprocal estimate.  *)
+    Vrecpe, [], All (2, Dreg), "vrecpe", elts_same_1, [U32; F32];
+    Vrecpe, [], All (2, Qreg), "vrecpeQ", elts_same_1, [U32; F32];
+
+    (* Reciprocal square-root estimate.  *)
+    Vrsqrte, [], All (2, Dreg), "vrsqrte", elts_same_1, [U32; F32];
+    Vrsqrte, [], All (2, Qreg), "vrsqrteQ", elts_same_1, [U32; F32];
+
+    (* Get lanes from a vector.  *)
+    Vget_lane,
+      [InfoWord; Disassembles_as [Use_operands [| Corereg; Element_of_dreg |]];
+       Instruction_name ["vmov"]],
+      Use_operands [| Corereg; Dreg; Immed |],
+      "vget_lane", get_lane, pf_su_8_32;
+    Vget_lane,
+      [InfoWord;
+       Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
+       Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+      Use_operands [| Corereg; Dreg; Immed |],
+      "vget_lane", notype_2, [S64; U64];
+    Vget_lane,
+      [InfoWord; Disassembles_as [Use_operands [| Corereg; Element_of_dreg |]];
+       Instruction_name ["vmov"]],
+      Use_operands [| Corereg; Qreg; Immed |],
+      "vgetQ_lane", get_lane, pf_su_8_32;
+    Vget_lane,
+      [InfoWord;
+       Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
+       Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+      Use_operands [| Corereg; Qreg; Immed |],
+      "vgetQ_lane", notype_2, [S64; U64];
+
+    (* Set lanes in a vector.  *)
+    Vset_lane, [Disassembles_as [Use_operands [| Element_of_dreg; Corereg |]];
+                Instruction_name ["vmov"]],
+      Use_operands [| Dreg; Corereg; Dreg; Immed |], "vset_lane",
+      set_lane, pf_su_8_32;
+    Vset_lane, [Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]];
+                Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+      Use_operands [| Dreg; Corereg; Dreg; Immed |], "vset_lane",
+      set_lane_notype, [S64; U64];
+    Vset_lane, [Disassembles_as [Use_operands [| Element_of_dreg; Corereg |]];
+                Instruction_name ["vmov"]],
+      Use_operands [| Qreg; Corereg; Qreg; Immed |], "vsetQ_lane",
+      set_lane, pf_su_8_32;
+    Vset_lane, [Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]];
+                Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+      Use_operands [| Qreg; Corereg; Qreg; Immed |], "vsetQ_lane",
+      set_lane_notype, [S64; U64];
+
+    (* Create vector from literal bit pattern.  *)
+    Vcreate,
+      [No_op], (* Not really, but it can yield various things that are too
+                  hard for the test generator at this time.  *)
+      Use_operands [| Dreg; Corereg |], "vcreate", create_vector,
+      pf_su_8_64;
+
+    (* Set all lanes to the same value.  *)
+    Vdup_n, [],
+      Use_operands [| Dreg; Corereg |], "vdup_n", bits_1,
+      pf_su_8_32;
+    Vdup_n,
+      [Instruction_name ["vmov"];
+       Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+      Use_operands [| Dreg; Corereg |], "vdup_n", notype_1,
+      [S64; U64];
+    Vdup_n, [],
+      Use_operands [| Qreg; Corereg |], "vdupQ_n", bits_1,
+      pf_su_8_32;
+    Vdup_n,
+      [Instruction_name ["vmov"];
+       Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |];
+                        Use_operands [| Dreg; Corereg; Corereg |]]],
+      Use_operands [| Qreg; Corereg |], "vdupQ_n", notype_1,
+      [S64; U64];
+
+    (* These are just aliases for the above.  *)
+    Vmov_n,
+      [Builtin_name "vdup_n"],
+      Use_operands [| Dreg; Corereg |],
+      "vmov_n", bits_1, pf_su_8_32;
+    Vmov_n,
+      [Builtin_name "vdup_n";
+       Instruction_name ["vmov"];
+       Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+      Use_operands [| Dreg; Corereg |],
+      "vmov_n", notype_1, [S64; U64];
+    Vmov_n,
+      [Builtin_name "vdupQ_n"],
+      Use_operands [| Qreg; Corereg |],
+      "vmovQ_n", bits_1, pf_su_8_32;
+    Vmov_n,
+      [Builtin_name "vdupQ_n";
+       Instruction_name ["vmov"];
+       Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |];
+                        Use_operands [| Dreg; Corereg; Corereg |]]],
+      Use_operands [| Qreg; Corereg |],
+      "vmovQ_n", notype_1, [S64; U64];
+
+    (* Duplicate, lane version.  We can't use Use_operands here because the
+       rightmost register (always Dreg) would be picked up by find_key_operand,
+       when we want the leftmost register to be used in this case (otherwise
+       the modes are indistinguishable in neon.md, etc.  *)
+    Vdup_lane,
+      [Disassembles_as [Use_operands [| Dreg; Element_of_dreg |]]],
+      Unary_scalar Dreg, "vdup_lane", bits_2, pf_su_8_32;
+    Vdup_lane,
+      [No_op; Const_valuator (fun _ -> 0)],
+      Unary_scalar Dreg, "vdup_lane", bits_2, [S64; U64];
+    Vdup_lane,
+      [Disassembles_as [Use_operands [| Qreg; Element_of_dreg |]]],
+      Unary_scalar Qreg, "vdupQ_lane", bits_2, pf_su_8_32;
+    Vdup_lane,
+      [No_op; Const_valuator (fun _ -> 0)],
+      Unary_scalar Qreg, "vdupQ_lane", bits_2, [S64; U64];
+
+    (* Combining vectors.  *)
+    Vcombine, [No_op],
+      Use_operands [| Qreg; Dreg; Dreg |], "vcombine", notype_2,
+      pf_su_8_64;
+
+    (* Splitting vectors.  *)
+    Vget_high, [No_op],
+      Use_operands [| Dreg; Qreg |], "vget_high",
+      notype_1, pf_su_8_64;
+    Vget_low, [Instruction_name ["vmov"];
+               Disassembles_as [Use_operands [| Dreg; Dreg |]]],
+      Use_operands [| Dreg; Qreg |], "vget_low",
+      notype_1, pf_su_8_64;
+
+    (* Conversions.  *)
+    Vcvt, [InfoWord], All (2, Dreg), "vcvt", conv_1,
+      [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+    Vcvt, [InfoWord], All (2, Qreg), "vcvtQ", conv_1,
+      [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+    Vcvt_n, [InfoWord], Use_operands [| Dreg; Dreg; Immed |], "vcvt_n", conv_2,
+      [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+    Vcvt_n, [InfoWord], Use_operands [| Qreg; Qreg; Immed |], "vcvtQ_n", conv_2,
+      [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+
+    (* Move, narrowing.  *)
+    Vmovn, [Disassembles_as [Use_operands [| Dreg; Qreg |]]],
+      Narrow, "vmovn", sign_invar_1, su_16_64;
+    Vmovn, [Disassembles_as [Use_operands [| Dreg; Qreg |]]; Saturating],
+      Narrow, "vqmovn", elts_same_1, su_16_64;
+    Vmovn,
+      [Disassembles_as [Use_operands [| Dreg; Qreg |]]; Saturating; Dst_unsign],
+      Narrow, "vqmovun", dst_unsign_1,
+      [S16; S32; S64];
+
+    (* Move, long.  *)
+    Vmovl, [Disassembles_as [Use_operands [| Qreg; Dreg |]]],
+      Long, "vmovl", elts_same_1, su_8_32;
+
+    (* Table lookup.  *)
+    Vtbl 1,
+      [Instruction_name ["vtbl"];
+       Disassembles_as [Use_operands [| Dreg; VecArray (1, Dreg); Dreg |]]],
+      Use_operands [| Dreg; Dreg; Dreg |], "vtbl1", table_2, [U8; S8; P8];
+    Vtbl 2, [Instruction_name ["vtbl"]],
+      Use_operands [| Dreg; VecArray (2, Dreg); Dreg |], "vtbl2", table_2,
+      [U8; S8; P8];
+    Vtbl 3, [Instruction_name ["vtbl"]],
+      Use_operands [| Dreg; VecArray (3, Dreg); Dreg |], "vtbl3", table_2,
+      [U8; S8; P8];
+    Vtbl 4, [Instruction_name ["vtbl"]],
+      Use_operands [| Dreg; VecArray (4, Dreg); Dreg |], "vtbl4", table_2,
+      [U8; S8; P8];
+
+    (* Extended table lookup.  *)
+    Vtbx 1,
+      [Instruction_name ["vtbx"];
+       Disassembles_as [Use_operands [| Dreg; VecArray (1, Dreg); Dreg |]]],
+      Use_operands [| Dreg; Dreg; Dreg |], "vtbx1", table_io, [U8; S8; P8];
+    Vtbx 2, [Instruction_name ["vtbx"]],
+      Use_operands [| Dreg; VecArray (2, Dreg); Dreg |], "vtbx2", table_io,
+      [U8; S8; P8];
+    Vtbx 3, [Instruction_name ["vtbx"]],
+      Use_operands [| Dreg; VecArray (3, Dreg); Dreg |], "vtbx3", table_io,
+      [U8; S8; P8];
+    Vtbx 4, [Instruction_name ["vtbx"]],
+      Use_operands [| Dreg; VecArray (4, Dreg); Dreg |], "vtbx4", table_io,
+      [U8; S8; P8];
+
+    (* Multiply, lane.  (note: these were undocumented at the time of
+       writing).  *)
+    Vmul_lane, [], By_scalar Dreg, "vmul_lane", sign_invar_2_lane,
+      [S16; S32; U16; U32; F32];
+    Vmul_lane, [], By_scalar Qreg, "vmulQ_lane", sign_invar_2_lane,
+      [S16; S32; U16; U32; F32];
+
+    (* Multiply-accumulate, lane.  *)
+    Vmla_lane, [], By_scalar Dreg, "vmla_lane", sign_invar_io_lane,
+      [S16; S32; U16; U32; F32];
+    Vmla_lane, [], By_scalar Qreg, "vmlaQ_lane", sign_invar_io_lane,
+      [S16; S32; U16; U32; F32];
+    Vmla_lane, [], Wide_lane, "vmlal_lane", elts_same_io_lane,
+      [S16; S32; U16; U32];
+    Vmla_lane, [Saturating; Doubling], Wide_lane, "vqdmlal_lane",
+      elts_same_io_lane, [S16; S32];
+
+    (* Multiply-subtract, lane.  *)
+    Vmls_lane, [], By_scalar Dreg, "vmls_lane", sign_invar_io_lane,
+      [S16; S32; U16; U32; F32];
+    Vmls_lane, [], By_scalar Qreg, "vmlsQ_lane", sign_invar_io_lane,
+      [S16; S32; U16; U32; F32];
+    Vmls_lane, [], Wide_lane, "vmlsl_lane", elts_same_io_lane,
+      [S16; S32; U16; U32];
+    Vmls_lane, [Saturating; Doubling], Wide_lane, "vqdmlsl_lane",
+      elts_same_io_lane, [S16; S32];
+
+    (* Long multiply, lane.  *)
+    Vmull_lane, [],
+      Wide_lane, "vmull_lane", elts_same_2_lane, [S16; S32; U16; U32];
+
+    (* Saturating doubling long multiply, lane.  *)
+    Vqdmull_lane, [Saturating; Doubling],
+      Wide_lane, "vqdmull_lane", elts_same_2_lane, [S16; S32];
+
+    (* Saturating doubling long multiply high, lane.  *)
+    Vqdmulh_lane, [Saturating; Halving],
+      By_scalar Qreg, "vqdmulhQ_lane", elts_same_2_lane, [S16; S32];
+    Vqdmulh_lane, [Saturating; Halving],
+      By_scalar Dreg, "vqdmulh_lane", elts_same_2_lane, [S16; S32];
+    Vqdmulh_lane, [Saturating; Halving; Rounding;
+		   Instruction_name ["vqrdmulh"]],
+      By_scalar Qreg, "vqRdmulhQ_lane", elts_same_2_lane, [S16; S32];
+    Vqdmulh_lane, [Saturating; Halving; Rounding;
+		   Instruction_name ["vqrdmulh"]],
+      By_scalar Dreg, "vqRdmulh_lane", elts_same_2_lane, [S16; S32];
+
+    (* Vector multiply by scalar.  *)
+    Vmul_n, [InfoWord;
+             Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+             Use_operands [| Dreg; Dreg; Corereg |], "vmul_n",
+      sign_invar_2, [S16; S32; U16; U32; F32];
+    Vmul_n, [InfoWord;
+             Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+             Use_operands [| Qreg; Qreg; Corereg |], "vmulQ_n",
+      sign_invar_2, [S16; S32; U16; U32; F32];
+
+    (* Vector long multiply by scalar.  *)
+    Vmull_n, [Instruction_name ["vmull"];
+              Disassembles_as [Use_operands [| Qreg; Dreg; Element_of_dreg |]]],
+              Wide_scalar, "vmull_n",
+      elts_same_2, [S16; S32; U16; U32];
+
+    (* Vector saturating doubling long multiply by scalar.  *)
+    Vqdmull_n, [Saturating; Doubling;
+	        Disassembles_as [Use_operands [| Qreg; Dreg;
+						 Element_of_dreg |]]],
+                Wide_scalar, "vqdmull_n",
+      elts_same_2, [S16; S32];
+
+    (* Vector saturating doubling long multiply high by scalar.  *)
+    Vqdmulh_n,
+      [Saturating; Halving; InfoWord;
+       Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+      Use_operands [| Qreg; Qreg; Corereg |],
+      "vqdmulhQ_n", elts_same_2, [S16; S32];
+    Vqdmulh_n,
+      [Saturating; Halving; InfoWord;
+       Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+      Use_operands [| Dreg; Dreg; Corereg |],
+      "vqdmulh_n", elts_same_2, [S16; S32];
+    Vqdmulh_n,
+      [Saturating; Halving; Rounding; InfoWord;
+       Instruction_name ["vqrdmulh"];
+       Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+      Use_operands [| Qreg; Qreg; Corereg |],
+      "vqRdmulhQ_n", elts_same_2, [S16; S32];
+    Vqdmulh_n,
+      [Saturating; Halving; Rounding; InfoWord;
+       Instruction_name ["vqrdmulh"];
+       Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+      Use_operands [| Dreg; Dreg; Corereg |],
+      "vqRdmulh_n", elts_same_2, [S16; S32];
+
+    (* Vector multiply-accumulate by scalar.  *)
+    Vmla_n, [InfoWord;
+             Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+      Use_operands [| Dreg; Dreg; Corereg |], "vmla_n",
+      sign_invar_io, [S16; S32; U16; U32; F32];
+    Vmla_n, [InfoWord;
+             Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+      Use_operands [| Qreg; Qreg; Corereg |], "vmlaQ_n",
+      sign_invar_io, [S16; S32; U16; U32; F32];
+    Vmla_n, [], Wide_scalar, "vmlal_n", elts_same_io, [S16; S32; U16; U32];
+    Vmla_n, [Saturating; Doubling], Wide_scalar, "vqdmlal_n", elts_same_io,
+      [S16; S32];
+
+    (* Vector multiply subtract by scalar.  *)
+    Vmls_n, [InfoWord;
+             Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+      Use_operands [| Dreg; Dreg; Corereg |], "vmls_n",
+      sign_invar_io, [S16; S32; U16; U32; F32];
+    Vmls_n, [InfoWord;
+             Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+      Use_operands [| Qreg; Qreg; Corereg |], "vmlsQ_n",
+      sign_invar_io, [S16; S32; U16; U32; F32];
+    Vmls_n, [], Wide_scalar, "vmlsl_n", elts_same_io, [S16; S32; U16; U32];
+    Vmls_n, [Saturating; Doubling], Wide_scalar, "vqdmlsl_n", elts_same_io,
+      [S16; S32];
+
+    (* Vector extract.  *)
+    Vext, [Const_valuator (fun _ -> 0)],
+      Use_operands [| Dreg; Dreg; Dreg; Immed |], "vext", extend,
+      pf_su_8_64;
+    Vext, [Const_valuator (fun _ -> 0)],
+      Use_operands [| Qreg; Qreg; Qreg; Immed |], "vextQ", extend,
+      pf_su_8_64;
+
+    (* Reverse elements.  *)
+    Vrev64, [], All (2, Dreg), "vrev64", bits_1, P8 :: P16 :: F32 :: su_8_32;
+    Vrev64, [], All (2, Qreg), "vrev64Q", bits_1, P8 :: P16 :: F32 :: su_8_32;
+    Vrev32, [], All (2, Dreg), "vrev32", bits_1, [P8; P16; S8; U8; S16; U16];
+    Vrev32, [], All (2, Qreg), "vrev32Q", bits_1, [P8; P16; S8; U8; S16; U16];
+    Vrev16, [], All (2, Dreg), "vrev16", bits_1, [P8; S8; U8];
+    Vrev16, [], All (2, Qreg), "vrev16Q", bits_1, [P8; S8; U8];
+
+    (* Bit selection.  *)
+    Vbsl,
+      [Instruction_name ["vbsl"; "vbit"; "vbif"];
+       Disassembles_as [Use_operands [| Dreg; Dreg; Dreg |]]],
+      Use_operands [| Dreg; Dreg; Dreg; Dreg |], "vbsl", bit_select,
+      pf_su_8_64;
+    Vbsl,
+      [Instruction_name ["vbsl"; "vbit"; "vbif"];
+       Disassembles_as [Use_operands [| Qreg; Qreg; Qreg |]]],
+      Use_operands [| Qreg; Qreg; Qreg; Qreg |], "vbslQ", bit_select,
+      pf_su_8_64;
+
+    (* Transpose elements.  **NOTE** ReturnPtr goes some of the way towards
+       generating good code for intrinsics which return structure types --
+       builtins work well by themselves (and understand that the values being
+       stored on e.g. the stack also reside in registers, so can optimise the
+       stores away entirely if the results are used immediately), but
+       intrinsics are very much less efficient. Maybe something can be improved
+       re: inlining, or tweaking the ABI used for intrinsics (a special call
+       attribute?).
+    *)
+    Vtrn, [ReturnPtr], Pair_result Dreg, "vtrn", bits_2, pf_su_8_32;
+    Vtrn, [ReturnPtr], Pair_result Qreg, "vtrnQ", bits_2, pf_su_8_32;
+
+    (* Zip elements.  *)
+    Vzip, [ReturnPtr], Pair_result Dreg, "vzip", bits_2, pf_su_8_32;
+    Vzip, [ReturnPtr], Pair_result Qreg, "vzipQ", bits_2, pf_su_8_32;
+
+    (* Unzip elements.  *)
+    Vuzp, [ReturnPtr], Pair_result Dreg, "vuzp", bits_2, pf_su_8_32;
+    Vuzp, [ReturnPtr], Pair_result Qreg, "vuzpQ", bits_2, pf_su_8_32;
+
+    (* Element/structure loads.  VLD1 variants.  *)
+    Vldx 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg |], "vld1", bits_1,
+      pf_su_8_64;
+    Vldx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+					      CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q", bits_1,
+      pf_su_8_64;
+
+    Vldx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+      "vld1_lane", bits_3, pf_su_8_32;
+    Vldx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]];
+       Const_valuator (fun _ -> 0)],
+      Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+      "vld1_lane", bits_3, [S64; U64];
+    Vldx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+      "vld1Q_lane", bits_3, pf_su_8_32;
+    Vldx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+      "vld1Q_lane", bits_3, [S64; U64];
+
+    Vldx_dup 1,
+      [Disassembles_as [Use_operands [| VecArray (1, All_elements_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+      bits_1, pf_su_8_32;
+    Vldx_dup 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+      bits_1, [S64; U64];
+    Vldx_dup 1,
+      [Disassembles_as [Use_operands [| VecArray (2, All_elements_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+      bits_1, pf_su_8_32;
+    Vldx_dup 1,
+      [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+      bits_1, [S64; U64];
+
+    (* VST1 variants.  *)
+    Vstx 1, [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Dreg |], "vst1",
+      store_1, pf_su_8_64;
+    Vstx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+					      PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Qreg |], "vst1Q",
+      store_1, pf_su_8_64;
+
+    Vstx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Dreg; Immed |],
+      "vst1_lane", store_3, pf_su_8_32;
+    Vstx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]];
+       Const_valuator (fun _ -> 0)],
+      Use_operands [| PtrTo Corereg; Dreg; Immed |],
+      "vst1_lane", store_3, [U64; S64];
+    Vstx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Qreg; Immed |],
+      "vst1Q_lane", store_3, pf_su_8_32;
+    Vstx_lane 1,
+      [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Qreg; Immed |],
+      "vst1Q_lane", store_3, [U64; S64];
+
+    (* VLD2 variants.  *)
+    Vldx 2, [], Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2", bits_1, pf_su_8_32;
+    Vldx 2, [Instruction_name ["vld1"]],
+       Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2", bits_1, [S64; U64];
+    Vldx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+                                              CstPtrTo Corereg |];
+                              Use_operands [| VecArray (2, Dreg);
+					      CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Qreg); CstPtrTo Corereg |],
+      "vld2Q", bits_1, pf_su_8_32;
+
+    Vldx_lane 2,
+      [Disassembles_as [Use_operands
+        [| VecArray (2, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg;
+                      VecArray (2, Dreg); Immed |],
+      "vld2_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+    Vldx_lane 2,
+      [Disassembles_as [Use_operands
+        [| VecArray (2, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Qreg); CstPtrTo Corereg;
+ 	              VecArray (2, Qreg); Immed |],
+      "vld2Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+
+    Vldx_dup 2,
+      [Disassembles_as [Use_operands
+        [| VecArray (2, All_elements_of_dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2_dup", bits_1, pf_su_8_32;
+    Vldx_dup 2,
+      [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (2, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2_dup", bits_1, [S64; U64];
+
+    (* VST2 variants.  *)
+    Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+                                              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+      store_1, pf_su_8_32;
+    Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+      store_1, [S64; U64];
+    Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+					      PtrTo Corereg |];
+                              Use_operands [| VecArray (2, Dreg);
+				              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Qreg) |], "vst2Q",
+      store_1, pf_su_8_32;
+
+    Vstx_lane 2,
+      [Disassembles_as [Use_operands
+        [| VecArray (2, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Dreg); Immed |], "vst2_lane",
+      store_3, P8 :: P16 :: F32 :: su_8_32;
+    Vstx_lane 2,
+      [Disassembles_as [Use_operands
+        [| VecArray (2, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Qreg); Immed |], "vst2Q_lane",
+      store_3, [P16; F32; U16; U32; S16; S32];
+
+    (* VLD3 variants.  *)
+    Vldx 3, [], Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3", bits_1, pf_su_8_32;
+    Vldx 3, [Instruction_name ["vld1"]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3", bits_1, [S64; U64];
+    Vldx 3, [Disassembles_as [Use_operands [| VecArray (3, Dreg);
+					      CstPtrTo Corereg |];
+                              Use_operands [| VecArray (3, Dreg);
+					      CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Qreg); CstPtrTo Corereg |],
+      "vld3Q", bits_1, P8 :: P16 :: F32 :: su_8_32;
+
+    Vldx_lane 3,
+      [Disassembles_as [Use_operands
+        [| VecArray (3, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg;
+                                     VecArray (3, Dreg); Immed |],
+      "vld3_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+    Vldx_lane 3,
+      [Disassembles_as [Use_operands
+        [| VecArray (3, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Qreg); CstPtrTo Corereg;
+				     VecArray (3, Qreg); Immed |],
+      "vld3Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+
+    Vldx_dup 3,
+      [Disassembles_as [Use_operands
+        [| VecArray (3, All_elements_of_dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3_dup", bits_1, pf_su_8_32;
+    Vldx_dup 3,
+      [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (3, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3_dup", bits_1, [S64; U64];
+
+    (* VST3 variants.  *)
+    Vstx 3, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+      store_1, pf_su_8_32;
+    Vstx 3, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+      store_1, [S64; U64];
+    Vstx 3, [Disassembles_as [Use_operands [| VecArray (3, Dreg);
+					      PtrTo Corereg |];
+                              Use_operands [| VecArray (3, Dreg);
+					      PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Qreg) |], "vst3Q",
+      store_1, pf_su_8_32;
+
+    Vstx_lane 3,
+      [Disassembles_as [Use_operands
+        [| VecArray (3, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Dreg); Immed |], "vst3_lane",
+      store_3, P8 :: P16 :: F32 :: su_8_32;
+    Vstx_lane 3,
+      [Disassembles_as [Use_operands
+        [| VecArray (3, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Qreg); Immed |], "vst3Q_lane",
+      store_3, [P16; F32; U16; U32; S16; S32];
+
+    (* VLD4/VST4 variants.  *)
+    Vldx 4, [], Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4", bits_1, pf_su_8_32;
+    Vldx 4, [Instruction_name ["vld1"]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4", bits_1, [S64; U64];
+    Vldx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+					      CstPtrTo Corereg |];
+                              Use_operands [| VecArray (4, Dreg);
+					      CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Qreg); CstPtrTo Corereg |],
+      "vld4Q", bits_1, P8 :: P16 :: F32 :: su_8_32;
+
+    Vldx_lane 4,
+      [Disassembles_as [Use_operands
+        [| VecArray (4, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg;
+                                     VecArray (4, Dreg); Immed |],
+      "vld4_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+    Vldx_lane 4,
+      [Disassembles_as [Use_operands
+        [| VecArray (4, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Qreg); CstPtrTo Corereg;
+   	              VecArray (4, Qreg); Immed |],
+      "vld4Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+
+    Vldx_dup 4,
+      [Disassembles_as [Use_operands
+        [| VecArray (4, All_elements_of_dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4_dup", bits_1, pf_su_8_32;
+    Vldx_dup 4,
+      [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (4, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4_dup", bits_1, [S64; U64];
+
+    Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+      store_1, pf_su_8_32;
+    Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+      store_1, [S64; U64];
+    Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+					      PtrTo Corereg |];
+                              Use_operands [| VecArray (4, Dreg);
+					      PtrTo Corereg |]]],
+     Use_operands [| PtrTo Corereg; VecArray (4, Qreg) |], "vst4Q",
+      store_1, pf_su_8_32;
+
+    Vstx_lane 4,
+      [Disassembles_as [Use_operands
+        [| VecArray (4, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (4, Dreg); Immed |], "vst4_lane",
+      store_3, P8 :: P16 :: F32 :: su_8_32;
+    Vstx_lane 4,
+      [Disassembles_as [Use_operands
+        [| VecArray (4, Element_of_dreg);
+           CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; VecArray (4, Qreg); Immed |], "vst4Q_lane",
+      store_3, [P16; F32; U16; U32; S16; S32];
+
+    (* Logical operations. And.  *)
+    Vand, [], All (3, Dreg), "vand", notype_2, su_8_64;
+    Vand, [], All (3, Qreg), "vandQ", notype_2, su_8_64;
+
+    (* Or.  *)
+    Vorr, [], All (3, Dreg), "vorr", notype_2, su_8_64;
+    Vorr, [], All (3, Qreg), "vorrQ", notype_2, su_8_64;
+
+    (* Eor.  *)
+    Veor, [], All (3, Dreg), "veor", notype_2, su_8_64;
+    Veor, [], All (3, Qreg), "veorQ", notype_2, su_8_64;
+
+    (* Bic (And-not).  *)
+    Vbic, [], All (3, Dreg), "vbic", notype_2, su_8_64;
+    Vbic, [], All (3, Qreg), "vbicQ", notype_2, su_8_64;
+
+    (* Or-not.  *)
+    Vorn, [], All (3, Dreg), "vorn", notype_2, su_8_64;
+    Vorn, [], All (3, Qreg), "vornQ", notype_2, su_8_64;
+  ]
+
+let reinterp =
+  let elems = P8 :: P16 :: F32 :: su_8_64 in
+  List.fold_right
+    (fun convto acc ->
+      let types = List.fold_right
+        (fun convfrom acc ->
+          if convfrom <> convto then
+            Cast (convto, convfrom) :: acc
+          else
+            acc)
+        elems
+        []
+      in
+        let dconv = Vreinterp, [No_op], Use_operands [| Dreg; Dreg |],
+                      "vreinterpret", conv_1, types
+        and qconv = Vreinterp, [No_op], Use_operands [| Qreg; Qreg |],
+		      "vreinterpretQ", conv_1, types in
+        dconv :: qconv :: acc)
+    elems
+    []
+
+(* Output routines.  *)
+
+let rec string_of_elt = function
+    S8 -> "s8" | S16 -> "s16" | S32 -> "s32" | S64 -> "s64"
+  | U8 -> "u8" | U16 -> "u16" | U32 -> "u32" | U64 -> "u64"
+  | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64"
+  | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64"
+  | F32 -> "f32" | P8 -> "p8" | P16 -> "p16"
+  | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b
+  | NoElts -> failwith "No elts"
+
+let string_of_elt_dots elt =
+  match elt with
+    Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "." ^ string_of_elt b
+  | _ -> string_of_elt elt
+
+let string_of_vectype vt =
+  let rec name affix = function
+    T_int8x8 -> affix "int8x8"
+  | T_int8x16 -> affix "int8x16"
+  | T_int16x4 -> affix "int16x4"
+  | T_int16x8 -> affix "int16x8"
+  | T_int32x2 -> affix "int32x2"
+  | T_int32x4 -> affix "int32x4"
+  | T_int64x1 -> affix "int64x1"
+  | T_int64x2 -> affix "int64x2"
+  | T_uint8x8 -> affix "uint8x8"
+  | T_uint8x16 -> affix "uint8x16"
+  | T_uint16x4 -> affix "uint16x4"
+  | T_uint16x8 -> affix "uint16x8"
+  | T_uint32x2 -> affix "uint32x2"
+  | T_uint32x4 -> affix "uint32x4"
+  | T_uint64x1 -> affix "uint64x1"
+  | T_uint64x2 -> affix "uint64x2"
+  | T_float32x2 -> affix "float32x2"
+  | T_float32x4 -> affix "float32x4"
+  | T_poly8x8 -> affix "poly8x8"
+  | T_poly8x16 -> affix "poly8x16"
+  | T_poly16x4 -> affix "poly16x4"
+  | T_poly16x8 -> affix "poly16x8"
+  | T_int8 -> affix "int8"
+  | T_int16 -> affix "int16"
+  | T_int32 -> affix "int32"
+  | T_int64 -> affix "int64"
+  | T_uint8 -> affix "uint8"
+  | T_uint16 -> affix "uint16"
+  | T_uint32 -> affix "uint32"
+  | T_uint64 -> affix "uint64"
+  | T_poly8 -> affix "poly8"
+  | T_poly16 -> affix "poly16"
+  | T_float32 -> affix "float32"
+  | T_immediate _ -> "const int"
+  | T_void -> "void"
+  | T_intQI -> "__builtin_neon_qi"
+  | T_intHI -> "__builtin_neon_hi"
+  | T_intSI -> "__builtin_neon_si"
+  | T_intDI -> "__builtin_neon_di"
+  | T_arrayof (num, base) ->
+      let basename = name (fun x -> x) base in
+      affix (Printf.sprintf "%sx%d" basename num)
+  | T_ptrto x ->
+      let basename = name affix x in
+      Printf.sprintf "%s *" basename
+  | T_const x ->
+      let basename = name affix x in
+      Printf.sprintf "const %s" basename
+  in
+    name (fun x -> x ^ "_t") vt
+
+let string_of_inttype = function
+    B_TImode -> "__builtin_neon_ti"
+  | B_EImode -> "__builtin_neon_ei"
+  | B_OImode -> "__builtin_neon_oi"
+  | B_CImode -> "__builtin_neon_ci"
+  | B_XImode -> "__builtin_neon_xi"
+
+let string_of_mode = function
+    V8QI -> "v8qi" | V4HI  -> "v4hi"  | V2SI -> "v2si" | V2SF -> "v2sf"
+  | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi" | V4SI -> "v4si"
+  | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI -> "qi" | HI -> "hi" | SI -> "si"
+  | SF -> "sf"
+
+(* Use uppercase chars for letters which form part of the intrinsic name, but
+   should be omitted from the builtin name (the info is passed in an extra
+   argument, instead).  *)
+let intrinsic_name name = String.lowercase name
+
+(* Allow the name of the builtin to be overridden by things (e.g. Flipped)
+   found in the features list.  *)
+let builtin_name features name =
+  let name = List.fold_right
+               (fun el name ->
+                 match el with
+                   Flipped x | Builtin_name x -> x
+                 | _ -> name)
+               features name in
+  let islower x = let str = String.make 1 x in (String.lowercase str) = str
+  and buf = Buffer.create (String.length name) in
+  String.iter (fun c -> if islower c then Buffer.add_char buf c) name;
+  Buffer.contents buf
+
+(* Transform an arity into a list of strings.  *)
+let strings_of_arity a =
+  match a with
+  | Arity0 vt -> [string_of_vectype vt]
+  | Arity1 (vt1, vt2) -> [string_of_vectype vt1; string_of_vectype vt2]
+  | Arity2 (vt1, vt2, vt3) -> [string_of_vectype vt1;
+			       string_of_vectype vt2;
+                               string_of_vectype vt3]
+  | Arity3 (vt1, vt2, vt3, vt4) -> [string_of_vectype vt1;
+                                    string_of_vectype vt2;
+                                    string_of_vectype vt3;
+                                    string_of_vectype vt4]
+  | Arity4 (vt1, vt2, vt3, vt4, vt5) -> [string_of_vectype vt1;
+                                         string_of_vectype vt2;
+                                         string_of_vectype vt3;
+                                         string_of_vectype vt4;
+                                         string_of_vectype vt5]
+
+(* Suffixes on the end of builtin names that are to be stripped in order
+   to obtain the name used as an instruction.  They are only stripped if
+   preceded immediately by an underscore.  *)
+let suffixes_to_strip = [ "n"; "lane"; "dup" ]
+
+(* Get the possible names of an instruction corresponding to a "name" from the
+   ops table.  This is done by getting the equivalent builtin name and
+   stripping any suffixes from the list at the top of this file, unless
+   the features list presents with an Instruction_name entry, in which
+   case that is used; or unless the features list presents with a Flipped
+   entry, in which case that is used.  If both such entries are present,
+   the first in the list will be chosen.  *)
+let get_insn_names features name =
+  let names = try
+  begin
+    match List.find (fun feature -> match feature with
+                                      Instruction_name _ -> true
+				    | Flipped _ -> true
+				    | _ -> false) features
+    with
+      Instruction_name names -> names
+    | Flipped name -> [name]
+    | _ -> assert false
+  end
+  with Not_found -> [builtin_name features name]
+  in
+  begin
+    List.map (fun name' ->
+      try
+        let underscore = String.rindex name' '_' in
+        let our_suffix = String.sub name' (underscore + 1)
+                                    ((String.length name') - underscore - 1)
+        in
+          let rec strip remaining_suffixes =
+            match remaining_suffixes with
+              [] -> name'
+            | s::ss when our_suffix = s -> String.sub name' 0 underscore
+            | _::ss -> strip ss
+          in
+            strip suffixes_to_strip
+      with (Not_found | Invalid_argument _) -> name') names
+  end
+
+(* Apply a function to each element of a list and then comma-separate
+   the resulting strings.  *)
+let rec commas f elts acc =
+  match elts with
+    [] -> acc
+  | [elt] -> acc ^ (f elt)
+  | elt::elts ->
+    commas f elts (acc ^ (f elt) ^ ", ")
+
+(* Given a list of features and the shape specified in the "ops" table, apply
+   a function to each possible shape that the instruction may have.
+   By default, this is the "shape" entry in "ops".  If the features list
+   contains a Disassembles_as entry, the shapes contained in that entry are
+   mapped to corresponding outputs and returned in a list.  If there is more
+   than one Disassembles_as entry, only the first is used.  *)
+let analyze_all_shapes features shape f =
+  try
+    match List.find (fun feature ->
+                       match feature with Disassembles_as _ -> true
+                                        | _ -> false)
+                    features with
+      Disassembles_as shapes -> List.map f shapes
+    | _ -> assert false
+  with Not_found -> [f shape]
+
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 06d8371..b8d154d 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -470,3 +470,43 @@
        (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) < 64")))
 
 
+;; Neon predicates
+
+(define_predicate "const_multiple_of_8_operand"
+  (match_code "const_int")
+{
+  unsigned HOST_WIDE_INT val = INTVAL (op);
+  return (val & 7) == 0;
+})
+
+(define_predicate "imm_for_neon_mov_operand"
+  (match_code "const_vector")
+{
+  return neon_immediate_valid_for_move (op, mode, NULL, NULL);
+})
+
+(define_predicate "imm_for_neon_logic_operand"
+  (match_code "const_vector")
+{
+  return neon_immediate_valid_for_logic (op, mode, 0, NULL, NULL);
+})
+
+(define_predicate "imm_for_neon_inv_logic_operand"
+  (match_code "const_vector")
+{
+  return neon_immediate_valid_for_logic (op, mode, 1, NULL, NULL);
+})
+
+(define_predicate "neon_logic_op2"
+  (ior (match_operand 0 "imm_for_neon_logic_operand")
+       (match_operand 0 "s_register_operand")))
+
+(define_predicate "neon_inv_logic_op2"
+  (ior (match_operand 0 "imm_for_neon_inv_logic_operand")
+       (match_operand 0 "s_register_operand")))
+
+;; TODO: We could check lane numbers more precisely based on the mode.
+(define_predicate "neon_lane_number"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 7")))
+
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index 1727407..cde00ee 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -9,8 +9,10 @@ MD_INCLUDES= 	$(srcdir)/config/arm/arm-tune.md \
 		$(srcdir)/config/arm/arm926ejs.md \
 		$(srcdir)/config/arm/cirrus.md \
 		$(srcdir)/config/arm/fpa.md \
+		$(srcdir)/config/arm/vec-common.md \
 		$(srcdir)/config/arm/iwmmxt.md \
 		$(srcdir)/config/arm/vfp.md \
+		$(srcdir)/config/arm/neon.md \
 		$(srcdir)/config/arm/thumb2.md
 
 s-config s-conditions s-flags s-codes s-constants s-emit s-recog s-preds \
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
new file mode 100644
index 0000000..0514b81
--- /dev/null
+++ b/gcc/config/arm/vec-common.md
@@ -0,0 +1,107 @@
+;; Machine Description for shared bits common to IWMMXT and Neon.
+;; Copyright (C) 2006 Free Software Foundation, Inc.
+;; Written by CodeSourcery.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the Free
+;; Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
+;; 02110-1301, USA.
+
+;; Vector Moves
+
+;; All integer and float modes supported by Neon and IWMMXT.
+(define_mode_macro VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
+
+;; All integer and float modes supported by Neon and IWMMXT, except V2DI.
+(define_mode_macro VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
+
+;; All integer modes supported by Neon and IWMMXT
+(define_mode_macro VINT [V2DI V2SI V4HI V8QI V4SI V8HI V16QI])
+
+;; All integer modes supported by Neon and IWMMXT, except V2DI
+(define_mode_macro VINTW [V2SI V4HI V8QI V4SI V8HI V16QI])
+
+(define_expand "mov<mode>"
+  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
+	(match_operand:VALL 1 "general_operand" ""))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+;; Vector arithmetic. Expanders are blank, then unnamed insns implement
+;; patterns seperately for IWMMXT and Neon.
+
+(define_expand "add<mode>3"
+  [(set (match_operand:VALL 0 "s_register_operand" "")
+        (plus:VALL (match_operand:VALL 1 "s_register_operand" "")
+                   (match_operand:VALL 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+(define_expand "sub<mode>3"
+  [(set (match_operand:VALL 0 "s_register_operand" "")
+        (minus:VALL (match_operand:VALL 1 "s_register_operand" "")
+                    (match_operand:VALL 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+(define_expand "mul<mode>3"
+  [(set (match_operand:VALLW 0 "s_register_operand" "")
+        (mult:VALLW (match_operand:VALLW 1 "s_register_operand" "")
+		    (match_operand:VALLW 2 "s_register_operand" "")))]
+  "TARGET_NEON || (<MODE>mode == V4HImode && TARGET_REALLY_IWMMXT)"
+{
+})
+
+(define_expand "smin<mode>3"
+  [(set (match_operand:VALLW 0 "s_register_operand" "")
+	(smin:VALLW (match_operand:VALLW 1 "s_register_operand" "")
+		    (match_operand:VALLW 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+(define_expand "umin<mode>3"
+  [(set (match_operand:VINTW 0 "s_register_operand" "")
+	(umin:VINTW (match_operand:VINTW 1 "s_register_operand" "")
+		    (match_operand:VINTW 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+(define_expand "smax<mode>3"
+  [(set (match_operand:VALLW 0 "s_register_operand" "")
+	(smax:VALLW (match_operand:VALLW 1 "s_register_operand" "")
+		    (match_operand:VALLW 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
+
+(define_expand "umax<mode>3"
+  [(set (match_operand:VINTW 0 "s_register_operand" "")
+	(umax:VINTW (match_operand:VINTW 1 "s_register_operand" "")
+		    (match_operand:VINTW 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+{
+})
author	Julian Brown <julian@codesourcery.com>	2007-07-25 12:28:31 +0000
committer	Julian Brown <jules@gcc.gnu.org>	2007-07-25 12:28:31 +0000
commit	88f77cba027fa1be471081bcd2ec03392246af3a (patch)
tree	3d9e535e4852684293654d9dcacf4334f53ce19c /gcc/config
parent	15d92b36a1cc26f5eda0c09f3fa1c369d0a36260 (diff)
download	gcc-88f77cba027fa1be471081bcd2ec03392246af3a.zip gcc-88f77cba027fa1be471081bcd2ec03392246af3a.tar.gz gcc-88f77cba027fa1be471081bcd2ec03392246af3a.tar.bz2