aboutsummaryrefslogtreecommitdiff
path: root/gas
diff options
context:
space:
mode:
authorH.J. Lu <hjl.tools@gmail.com>2019-12-12 11:56:06 -0800
committerH.J. Lu <hjl.tools@gmail.com>2019-12-12 12:03:45 -0800
commite379e5f385f874adb0b414f917adb1fc50e20de9 (patch)
treee9a1ad41c0582dd54bfebe9391afb382105f907c /gas
parentb65d68b8d60338a860e169aac3bac54e6bac60db (diff)
downloadfsf-binutils-gdb-e379e5f385f874adb0b414f917adb1fc50e20de9.zip
fsf-binutils-gdb-e379e5f385f874adb0b414f917adb1fc50e20de9.tar.gz
fsf-binutils-gdb-e379e5f385f874adb0b414f917adb1fc50e20de9.tar.bz2
i386: Align branches within a fixed boundary
Add 3 command-line options to align branches within a fixed boundary with segment prefixes or NOPs: 1. -malign-branch-boundary=NUM aligns branches within NUM byte boundary. 2. -malign-branch=TYPE[+TYPE...] specifies types of branches to align. The supported branches are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Call. e. Ret. f. Indirect jump and call. 3. -malign-branch-prefix-size=NUM aligns branches with NUM segment prefixes per instruction. 3 new rs_machine_dependent frag types are added: 1. BRANCH_PADDING. The variable size frag to insert NOP before branch. 2. BRANCH_PREFIX. The variable size frag to insert segment prefixes to an instruction. The choices of prefixes are: a. Use the existing segment prefix if there is one. b. Use CS segment prefix in 64-bit mode. c. In 32-bit mode, use SS segment prefix with ESP/EBP base register and use DS segment prefix without ESP/EBP base register. 3. FUSED_JCC_PADDING. The variable size frag to insert NOP before fused conditional jump. The new rs_machine_dependent frags aren't inserted if the previous item is a prefix or a constant directive, which may be used to hardcode an instruction, since there is no clear instruction boundary. Segment prefixes and NOP padding are disabled before relaxable TLS relocations and tls_get_addr calls to keep TLS instruction sequence unchanged. md_estimate_size_before_relax() and i386_generic_table_relax_frag() are used to handled BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. i386_generic_table_relax_frag() grows or shrinks sizes of segment prefix and NOP to align the next branch frag: 1. First try to add segment prefixes to instructions before a branch. 2. If there is no sufficient room to add segment prefixes, NOP will be inserted before a branch. * config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc. (tls_get_addr): New. (last_insn): New. (align_branch_power): New. (align_branch_kind): New. (align_branch_bit): New. (align_branch): New. (MAX_FUSED_JCC_PADDING_SIZE): New. (align_branch_prefix_size): New. (BRANCH_PADDING): New. (BRANCH_PREFIX): New. (FUSED_JCC_PADDING): New. (i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING. (md_begin): Abort if align_branch_prefix_size < MAX_FUSED_JCC_PADDING_SIZE. (md_assemble): Set last_insn. (maybe_fused_with_jcc_p): New. (add_fused_jcc_padding_frag_p): New. (add_branch_prefix_frag_p): New. (add_branch_padding_frag_p): New. (output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or BRANCH_PREFIX frag and terminate each frag to align branches. (output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and relaxable TLS relocations. (output_imm): Likewise. (i386_next_non_empty_frag): New. (i386_next_jcc_frag): New. (i386_classify_machine_dependent_frag): New. (i386_branch_padding_size): New. (i386_generic_table_relax_frag): New. (md_estimate_size_before_relax): Handle COND_JUMP_PADDING, FUSED_JCC_PADDING and COND_JUMP_PREFIX frags. (md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. (OPTION_MALIGN_BRANCH_BOUNDARY): New. (OPTION_MALIGN_BRANCH_PREFIX_SIZE): New. (OPTION_MALIGN_BRANCH): New. (md_longopts): Add -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (md_parse_option): Handle -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (md_show_usage): Display -malign-branch-boundary=, -malign-branch-prefix-size= and -malign-branch=. (i386_target_format): Set tls_get_addr. (i386_cons_align): New. * config/tc-i386.h (i386_cons_align): New. (md_cons_align): New. (i386_generic_table_relax_frag): New. (md_generic_table_relax_frag): New. (i386_tc_frag_data): Add u, padding_address, length, max_prefix_length, prefix_length, default_prefix, cmp_size, classified and branch_type. (TC_FRAG_INIT): Initialize u, padding_address, length, max_prefix_length, prefix_length, default_prefix, cmp_size, classified and branch_type. * doc/c-i386.texi: Document -malign-branch-boundary=, -malign-branch= and -malign-branch-prefix-size=.
Diffstat (limited to 'gas')
-rw-r--r--gas/ChangeLog60
-rw-r--r--gas/config/tc-i386.c1046
-rw-r--r--gas/config/tc-i386.h31
-rw-r--r--gas/doc/c-i386.texi26
4 files changed, 1160 insertions, 3 deletions
diff --git a/gas/ChangeLog b/gas/ChangeLog
index ef7369c..7b83a0a 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,5 +1,65 @@
2019-12-12 H.J. Lu <hongjiu.lu@intel.com>
+ * config/tc-i386.c (_i386_insn): Add has_gotpc_tls_reloc.
+ (tls_get_addr): New.
+ (last_insn): New.
+ (align_branch_power): New.
+ (align_branch_kind): New.
+ (align_branch_bit): New.
+ (align_branch): New.
+ (MAX_FUSED_JCC_PADDING_SIZE): New.
+ (align_branch_prefix_size): New.
+ (BRANCH_PADDING): New.
+ (BRANCH_PREFIX): New.
+ (FUSED_JCC_PADDING): New.
+ (i386_generate_nops): Support BRANCH_PADDING and FUSED_JCC_PADDING.
+ (md_begin): Abort if align_branch_prefix_size <
+ MAX_FUSED_JCC_PADDING_SIZE.
+ (md_assemble): Set last_insn.
+ (maybe_fused_with_jcc_p): New.
+ (add_fused_jcc_padding_frag_p): New.
+ (add_branch_prefix_frag_p): New.
+ (add_branch_padding_frag_p): New.
+ (output_insn): Generate a BRANCH_PADDING, FUSED_JCC_PADDING or
+ BRANCH_PREFIX frag and terminate each frag to align branches.
+ (output_disp): Set i.has_gotpc_tls_reloc to TRUE for GOTPC and
+ relaxable TLS relocations.
+ (output_imm): Likewise.
+ (i386_next_non_empty_frag): New.
+ (i386_next_jcc_frag): New.
+ (i386_classify_machine_dependent_frag): New.
+ (i386_branch_padding_size): New.
+ (i386_generic_table_relax_frag): New.
+ (md_estimate_size_before_relax): Handle COND_JUMP_PADDING,
+ FUSED_JCC_PADDING and COND_JUMP_PREFIX frags.
+ (md_convert_frag): Handle BRANCH_PADDING, BRANCH_PREFIX and
+ FUSED_JCC_PADDING frags.
+ (OPTION_MALIGN_BRANCH_BOUNDARY): New.
+ (OPTION_MALIGN_BRANCH_PREFIX_SIZE): New.
+ (OPTION_MALIGN_BRANCH): New.
+ (md_longopts): Add -malign-branch-boundary=,
+ -malign-branch-prefix-size= and -malign-branch=.
+ (md_parse_option): Handle -malign-branch-boundary=,
+ -malign-branch-prefix-size= and -malign-branch=.
+ (md_show_usage): Display -malign-branch-boundary=,
+ -malign-branch-prefix-size= and -malign-branch=.
+ (i386_target_format): Set tls_get_addr.
+ (i386_cons_align): New.
+ * config/tc-i386.h (i386_cons_align): New.
+ (md_cons_align): New.
+ (i386_generic_table_relax_frag): New.
+ (md_generic_table_relax_frag): New.
+ (i386_tc_frag_data): Add u, padding_address, length,
+ max_prefix_length, prefix_length, default_prefix, cmp_size,
+ classified and branch_type.
+ (TC_FRAG_INIT): Initialize u, padding_address, length,
+ max_prefix_length, prefix_length, default_prefix, cmp_size,
+ classified and branch_type.
+ * doc/c-i386.texi: Document -malign-branch-boundary=,
+ -malign-branch= and -malign-branch-prefix-size=.
+
+2019-12-12 H.J. Lu <hongjiu.lu@intel.com>
+
* write.c (md_generic_table_relax_frag): New. Defined to
relax_frag if not defined.
(relax_segment): Call md_generic_table_relax_frag instead of
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index accb634..41a6a08 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -368,6 +368,9 @@ struct _i386_insn
/* Has ZMM register operands. */
bfd_boolean has_regzmm;
+ /* Has GOTPC or TLS relocation. */
+ bfd_boolean has_gotpc_tls_reloc;
+
/* RM and SIB are the modrm byte and the sib byte where the
addressing modes of this insn are encoded. */
modrm_byte rm;
@@ -562,6 +565,8 @@ static enum flag_code flag_code;
static unsigned int object_64bit;
static unsigned int disallow_64bit_reloc;
static int use_rela_relocations = 0;
+/* __tls_get_addr/___tls_get_addr symbol for TLS. */
+static const char *tls_get_addr;
#if ((defined (OBJ_MAYBE_COFF) && defined (OBJ_MAYBE_AOUT)) \
|| defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF) \
@@ -622,6 +627,21 @@ static int omit_lock_prefix = 0;
"lock addl $0, (%{re}sp)". */
static int avoid_fence = 0;
+/* Type of the previous instruction. */
+static struct
+ {
+ segT seg;
+ const char *file;
+ const char *name;
+ unsigned int line;
+ enum last_insn_kind
+ {
+ last_insn_other = 0,
+ last_insn_directive,
+ last_insn_prefix
+ } kind;
+ } last_insn;
+
/* 1 if the assembler should generate relax relocations. */
static int generate_relax_relocations
@@ -635,6 +655,44 @@ static enum check_kind
}
sse_check, operand_check = check_warning;
+/* Non-zero if branches should be aligned within power of 2 boundary. */
+static int align_branch_power = 0;
+
+/* Types of branches to align. */
+enum align_branch_kind
+ {
+ align_branch_none = 0,
+ align_branch_jcc = 1,
+ align_branch_fused = 2,
+ align_branch_jmp = 3,
+ align_branch_call = 4,
+ align_branch_indirect = 5,
+ align_branch_ret = 6
+ };
+
+/* Type bits of branches to align. */
+enum align_branch_bit
+ {
+ align_branch_jcc_bit = 1 << align_branch_jcc,
+ align_branch_fused_bit = 1 << align_branch_fused,
+ align_branch_jmp_bit = 1 << align_branch_jmp,
+ align_branch_call_bit = 1 << align_branch_call,
+ align_branch_indirect_bit = 1 << align_branch_indirect,
+ align_branch_ret_bit = 1 << align_branch_ret
+ };
+
+static unsigned int align_branch = (align_branch_jcc_bit
+ | align_branch_fused_bit
+ | align_branch_jmp_bit);
+
+/* The maximum padding size for fused jcc. CMP like instruction can
+ be 9 bytes and jcc can be 6 bytes. Leave room just in case for
+ prefixes. */
+#define MAX_FUSED_JCC_PADDING_SIZE 20
+
+/* The maximum number of prefixes added for an instruction. */
+static unsigned int align_branch_prefix_size = 5;
+
/* Optimization:
1. Clear the REX_W bit with register operand if possible.
2. Above plus use 128bit vector instruction to clear the full vector
@@ -738,12 +796,19 @@ int x86_cie_data_alignment;
/* Interface to relax_segment.
There are 3 major relax states for 386 jump insns because the
different types of jumps add different sizes to frags when we're
- figuring out what sort of jump to choose to reach a given label. */
+ figuring out what sort of jump to choose to reach a given label.
+
+ BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING are used to align
+ branches which are handled by md_estimate_size_before_relax() and
+ i386_generic_table_relax_frag(). */
/* Types. */
#define UNCOND_JUMP 0
#define COND_JUMP 1
#define COND_JUMP86 2
+#define BRANCH_PADDING 3
+#define BRANCH_PREFIX 4
+#define FUSED_JCC_PADDING 5
/* Sizes. */
#define CODE16 1
@@ -1384,6 +1449,12 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit)
case rs_fill_nop:
case rs_align_code:
break;
+ case rs_machine_dependent:
+ /* Allow NOP padding for jumps and calls. */
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING)
+ break;
+ /* Fall through. */
default:
return;
}
@@ -1528,7 +1599,7 @@ i386_generate_nops (fragS *fragP, char *where, offsetT count, int limit)
return;
}
}
- else
+ else if (fragP->fr_type != rs_machine_dependent)
fragP->fr_var = count;
if ((count / max_single_nop_size) > max_number_of_nops)
@@ -3011,6 +3082,11 @@ md_begin (void)
x86_dwarf2_return_column = 8;
x86_cie_data_alignment = -4;
}
+
+ /* NB: FUSED_JCC_PADDING frag must have sufficient room so that it
+ can be turned into BRANCH_PREFIX frag. */
+ if (align_branch_prefix_size > MAX_FUSED_JCC_PADDING_SIZE)
+ abort ();
}
void
@@ -4539,6 +4615,17 @@ md_assemble (char *line)
/* We are ready to output the insn. */
output_insn ();
+
+ last_insn.seg = now_seg;
+
+ if (i.tm.opcode_modifier.isprefix)
+ {
+ last_insn.kind = last_insn_prefix;
+ last_insn.name = i.tm.name;
+ last_insn.file = as_where (&last_insn.line);
+ }
+ else
+ last_insn.kind = last_insn_other;
}
static char *
@@ -8196,11 +8283,206 @@ encoding_length (const fragS *start_frag, offsetT start_off,
return len - start_off + (frag_now_ptr - frag_now->fr_literal);
}
+/* Return 1 for test, and, cmp, add, sub, inc and dec which may
+ be macro-fused with conditional jumps. */
+
+static int
+maybe_fused_with_jcc_p (void)
+{
+ /* No RIP address. */
+ if (i.base_reg && i.base_reg->reg_num == RegIP)
+ return 0;
+
+ /* No VEX/EVEX encoding. */
+ if (is_any_vex_encoding (&i.tm))
+ return 0;
+
+ /* and, add, sub with destination register. */
+ if ((i.tm.base_opcode >= 0x20 && i.tm.base_opcode <= 0x25)
+ || i.tm.base_opcode <= 5
+ || (i.tm.base_opcode >= 0x28 && i.tm.base_opcode <= 0x2d)
+ || ((i.tm.base_opcode | 3) == 0x83
+ && ((i.tm.extension_opcode | 1) == 0x5
+ || i.tm.extension_opcode == 0x0)))
+ return (i.types[1].bitfield.class == Reg
+ || i.types[1].bitfield.instance == Accum);
+
+ /* test, cmp with any register. */
+ if ((i.tm.base_opcode | 1) == 0x85
+ || (i.tm.base_opcode | 1) == 0xa9
+ || ((i.tm.base_opcode | 1) == 0xf7
+ && i.tm.extension_opcode == 0)
+ || (i.tm.base_opcode >= 0x38 && i.tm.base_opcode <= 0x3d)
+ || ((i.tm.base_opcode | 3) == 0x83
+ && (i.tm.extension_opcode == 0x7)))
+ return (i.types[0].bitfield.class == Reg
+ || i.types[0].bitfield.instance == Accum
+ || i.types[1].bitfield.class == Reg
+ || i.types[1].bitfield.instance == Accum);
+
+ /* inc, dec with any register. */
+ if ((i.tm.cpu_flags.bitfield.cpuno64
+ && (i.tm.base_opcode | 0xf) == 0x4f)
+ || ((i.tm.base_opcode | 1) == 0xff
+ && i.tm.extension_opcode <= 0x1))
+ return (i.types[0].bitfield.class == Reg
+ || i.types[0].bitfield.instance == Accum);
+
+ return 0;
+}
+
+/* Return 1 if a FUSED_JCC_PADDING frag should be generated. */
+
+static int
+add_fused_jcc_padding_frag_p (void)
+{
+ /* NB: Don't work with COND_JUMP86 without i386. */
+ if (!align_branch_power
+ || now_seg == absolute_section
+ || !cpu_arch_flags.bitfield.cpui386
+ || !(align_branch & align_branch_fused_bit))
+ return 0;
+
+ if (maybe_fused_with_jcc_p ())
+ {
+ if (last_insn.kind == last_insn_other
+ || last_insn.seg != now_seg)
+ return 1;
+ if (flag_debug)
+ as_warn_where (last_insn.file, last_insn.line,
+ _("`%s` skips -malign-branch-boundary on `%s`"),
+ last_insn.name, i.tm.name);
+ }
+
+ return 0;
+}
+
+/* Return 1 if a BRANCH_PREFIX frag should be generated. */
+
+static int
+add_branch_prefix_frag_p (void)
+{
+ /* NB: Don't work with COND_JUMP86 without i386. Don't add prefix
+ to PadLock instructions since they include prefixes in opcode. */
+ if (!align_branch_power
+ || !align_branch_prefix_size
+ || now_seg == absolute_section
+ || i.tm.cpu_flags.bitfield.cpupadlock
+ || !cpu_arch_flags.bitfield.cpui386)
+ return 0;
+
+ /* Don't add prefix if it is a prefix or there is no operand in case
+ that segment prefix is special. */
+ if (!i.operands || i.tm.opcode_modifier.isprefix)
+ return 0;
+
+ if (last_insn.kind == last_insn_other
+ || last_insn.seg != now_seg)
+ return 1;
+
+ if (flag_debug)
+ as_warn_where (last_insn.file, last_insn.line,
+ _("`%s` skips -malign-branch-boundary on `%s`"),
+ last_insn.name, i.tm.name);
+
+ return 0;
+}
+
+/* Return 1 if a BRANCH_PADDING frag should be generated. */
+
+static int
+add_branch_padding_frag_p (enum align_branch_kind *branch_p)
+{
+ int add_padding;
+
+ /* NB: Don't work with COND_JUMP86 without i386. */
+ if (!align_branch_power
+ || now_seg == absolute_section
+ || !cpu_arch_flags.bitfield.cpui386)
+ return 0;
+
+ add_padding = 0;
+
+ /* Check for jcc and direct jmp. */
+ if (i.tm.opcode_modifier.jump == JUMP)
+ {
+ if (i.tm.base_opcode == JUMP_PC_RELATIVE)
+ {
+ *branch_p = align_branch_jmp;
+ add_padding = align_branch & align_branch_jmp_bit;
+ }
+ else
+ {
+ *branch_p = align_branch_jcc;
+ if ((align_branch & align_branch_jcc_bit))
+ add_padding = 1;
+ }
+ }
+ else if (is_any_vex_encoding (&i.tm))
+ return 0;
+ else if ((i.tm.base_opcode | 1) == 0xc3)
+ {
+ /* Near ret. */
+ *branch_p = align_branch_ret;
+ if ((align_branch & align_branch_ret_bit))
+ add_padding = 1;
+ }
+ else
+ {
+ /* Check for indirect jmp, direct and indirect calls. */
+ if (i.tm.base_opcode == 0xe8)
+ {
+ /* Direct call. */
+ *branch_p = align_branch_call;
+ if ((align_branch & align_branch_call_bit))
+ add_padding = 1;
+ }
+ else if (i.tm.base_opcode == 0xff
+ && (i.tm.extension_opcode == 2
+ || i.tm.extension_opcode == 4))
+ {
+ /* Indirect call and jmp. */
+ *branch_p = align_branch_indirect;
+ if ((align_branch & align_branch_indirect_bit))
+ add_padding = 1;
+ }
+
+ if (add_padding
+ && i.disp_operands
+ && tls_get_addr
+ && (i.op[0].disps->X_op == O_symbol
+ || (i.op[0].disps->X_op == O_subtract
+ && i.op[0].disps->X_op_symbol == GOT_symbol)))
+ {
+ symbolS *s = i.op[0].disps->X_add_symbol;
+ /* No padding to call to global or undefined tls_get_addr. */
+ if ((S_IS_EXTERNAL (s) || !S_IS_DEFINED (s))
+ && strcmp (S_GET_NAME (s), tls_get_addr) == 0)
+ return 0;
+ }
+ }
+
+ if (add_padding
+ && last_insn.kind != last_insn_other
+ && last_insn.seg == now_seg)
+ {
+ if (flag_debug)
+ as_warn_where (last_insn.file, last_insn.line,
+ _("`%s` skips -malign-branch-boundary on `%s`"),
+ last_insn.name, i.tm.name);
+ return 0;
+ }
+
+ return add_padding;
+}
+
static void
output_insn (void)
{
fragS *insn_start_frag;
offsetT insn_start_off;
+ fragS *fragP = NULL;
+ enum align_branch_kind branch = align_branch_none;
#if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
if (IS_ELF && x86_used_note)
@@ -8291,6 +8573,31 @@ output_insn (void)
insn_start_frag = frag_now;
insn_start_off = frag_now_fix ();
+ if (add_branch_padding_frag_p (&branch))
+ {
+ char *p;
+ /* Branch can be 8 bytes. Leave some room for prefixes. */
+ unsigned int max_branch_padding_size = 14;
+
+ /* Align section to boundary. */
+ record_alignment (now_seg, align_branch_power);
+
+ /* Make room for padding. */
+ frag_grow (max_branch_padding_size);
+
+ /* Start of the padding. */
+ p = frag_more (0);
+
+ fragP = frag_now;
+
+ frag_var (rs_machine_dependent, max_branch_padding_size, 0,
+ ENCODE_RELAX_STATE (BRANCH_PADDING, 0),
+ NULL, 0, p);
+
+ fragP->tc_frag_data.branch_type = branch;
+ fragP->tc_frag_data.max_bytes = max_branch_padding_size;
+ }
+
/* Output jumps. */
if (i.tm.opcode_modifier.jump == JUMP)
output_branch ();
@@ -8329,6 +8636,41 @@ output_insn (void)
i.prefix[LOCK_PREFIX] = 0;
}
+ if (branch)
+ /* Skip if this is a branch. */
+ ;
+ else if (add_fused_jcc_padding_frag_p ())
+ {
+ /* Make room for padding. */
+ frag_grow (MAX_FUSED_JCC_PADDING_SIZE);
+ p = frag_more (0);
+
+ fragP = frag_now;
+
+ frag_var (rs_machine_dependent, MAX_FUSED_JCC_PADDING_SIZE, 0,
+ ENCODE_RELAX_STATE (FUSED_JCC_PADDING, 0),
+ NULL, 0, p);
+
+ fragP->tc_frag_data.branch_type = align_branch_fused;
+ fragP->tc_frag_data.max_bytes = MAX_FUSED_JCC_PADDING_SIZE;
+ }
+ else if (add_branch_prefix_frag_p ())
+ {
+ unsigned int max_prefix_size = align_branch_prefix_size;
+
+ /* Make room for padding. */
+ frag_grow (max_prefix_size);
+ p = frag_more (0);
+
+ fragP = frag_now;
+
+ frag_var (rs_machine_dependent, max_prefix_size, 0,
+ ENCODE_RELAX_STATE (BRANCH_PREFIX, 0),
+ NULL, 0, p);
+
+ fragP->tc_frag_data.max_bytes = max_prefix_size;
+ }
+
/* Since the VEX/EVEX prefix contains the implicit prefix, we
don't need the explicit prefix. */
if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
@@ -8476,9 +8818,105 @@ output_insn (void)
if (j > 15)
as_warn (_("instruction length of %u bytes exceeds the limit of 15"),
j);
+ else if (fragP)
+ {
+ /* NB: Don't add prefix with GOTPC relocation since
+ output_disp() above depends on the fixed encoding
+ length. Can't add prefix with TLS relocation since
+ it breaks TLS linker optimization. */
+ unsigned int max = i.has_gotpc_tls_reloc ? 0 : 15 - j;
+ /* Prefix count on the current instruction. */
+ unsigned int count = i.vex.length;
+ unsigned int k;
+ for (k = 0; k < ARRAY_SIZE (i.prefix); k++)
+ /* REX byte is encoded in VEX/EVEX prefix. */
+ if (i.prefix[k] && (k != REX_PREFIX || !i.vex.length))
+ count++;
+
+ /* Count prefixes for extended opcode maps. */
+ if (!i.vex.length)
+ switch (i.tm.opcode_length)
+ {
+ case 3:
+ if (((i.tm.base_opcode >> 16) & 0xff) == 0xf)
+ {
+ count++;
+ switch ((i.tm.base_opcode >> 8) & 0xff)
+ {
+ case 0x38:
+ case 0x3a:
+ count++;
+ break;
+ default:
+ break;
+ }
+ }
+ break;
+ case 2:
+ if (((i.tm.base_opcode >> 8) & 0xff) == 0xf)
+ count++;
+ break;
+ case 1:
+ break;
+ default:
+ abort ();
+ }
+
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ == BRANCH_PREFIX)
+ {
+ /* Set the maximum prefix size in BRANCH_PREFIX
+ frag. */
+ if (fragP->tc_frag_data.max_bytes > max)
+ fragP->tc_frag_data.max_bytes = max;
+ if (fragP->tc_frag_data.max_bytes > count)
+ fragP->tc_frag_data.max_bytes -= count;
+ else
+ fragP->tc_frag_data.max_bytes = 0;
+ }
+ else
+ {
+ /* Remember the maximum prefix size in FUSED_JCC_PADDING
+ frag. */
+ unsigned int max_prefix_size;
+ if (align_branch_prefix_size > max)
+ max_prefix_size = max;
+ else
+ max_prefix_size = align_branch_prefix_size;
+ if (max_prefix_size > count)
+ fragP->tc_frag_data.max_prefix_length
+ = max_prefix_size - count;
+ }
+
+ /* Use existing segment prefix if possible. Use CS
+ segment prefix in 64-bit mode. In 32-bit mode, use SS
+ segment prefix with ESP/EBP base register and use DS
+ segment prefix without ESP/EBP base register. */
+ if (i.prefix[SEG_PREFIX])
+ fragP->tc_frag_data.default_prefix = i.prefix[SEG_PREFIX];
+ else if (flag_code == CODE_64BIT)
+ fragP->tc_frag_data.default_prefix = CS_PREFIX_OPCODE;
+ else if (i.base_reg
+ && (i.base_reg->reg_num == 4
+ || i.base_reg->reg_num == 5))
+ fragP->tc_frag_data.default_prefix = SS_PREFIX_OPCODE;
+ else
+ fragP->tc_frag_data.default_prefix = DS_PREFIX_OPCODE;
+ }
}
}
+ /* NB: Don't work with COND_JUMP86 without i386. */
+ if (align_branch_power
+ && now_seg != absolute_section
+ && cpu_arch_flags.bitfield.cpui386)
+ {
+ /* Terminate each frag so that we can add prefix and check for
+ fused jcc. */
+ frag_wane (frag_now);
+ frag_new (0);
+ }
+
#ifdef DEBUG386
if (flag_debug)
{
@@ -8588,6 +9026,7 @@ output_disp (fragS *insn_start_frag, offsetT insn_start_off)
if (!object_64bit)
{
reloc_type = BFD_RELOC_386_GOTPC;
+ i.has_gotpc_tls_reloc = TRUE;
i.op[n].imms->X_add_number +=
encoding_length (insn_start_frag, insn_start_off, p);
}
@@ -8599,6 +9038,27 @@ output_disp (fragS *insn_start_frag, offsetT insn_start_off)
insn, and that is taken care of in other code. */
reloc_type = BFD_RELOC_X86_64_GOTPC32;
}
+ else if (align_branch_power)
+ {
+ switch (reloc_type)
+ {
+ case BFD_RELOC_386_TLS_GD:
+ case BFD_RELOC_386_TLS_LDM:
+ case BFD_RELOC_386_TLS_IE:
+ case BFD_RELOC_386_TLS_IE_32:
+ case BFD_RELOC_386_TLS_GOTIE:
+ case BFD_RELOC_386_TLS_GOTDESC:
+ case BFD_RELOC_386_TLS_DESC_CALL:
+ case BFD_RELOC_X86_64_TLSGD:
+ case BFD_RELOC_X86_64_TLSLD:
+ case BFD_RELOC_X86_64_GOTTPOFF:
+ case BFD_RELOC_X86_64_GOTPC32_TLSDESC:
+ case BFD_RELOC_X86_64_TLSDESC_CALL:
+ i.has_gotpc_tls_reloc = TRUE;
+ default:
+ break;
+ }
+ }
fixP = fix_new_exp (frag_now, p - frag_now->fr_literal,
size, i.op[n].disps, pcrel,
reloc_type);
@@ -8740,6 +9200,7 @@ output_imm (fragS *insn_start_frag, offsetT insn_start_off)
reloc_type = BFD_RELOC_X86_64_GOTPC32;
else if (size == 8)
reloc_type = BFD_RELOC_X86_64_GOTPC64;
+ i.has_gotpc_tls_reloc = TRUE;
i.op[n].imms->X_add_number +=
encoding_length (insn_start_frag, insn_start_off, p);
}
@@ -10365,6 +10826,362 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
}
#endif
+/* Return the next non-empty frag. */
+
+static fragS *
+i386_next_non_empty_frag (fragS *fragP)
+{
+ /* There may be a frag with a ".fill 0" when there is no room in
+ the current frag for frag_grow in output_insn. */
+ for (fragP = fragP->fr_next;
+ (fragP != NULL
+ && fragP->fr_type == rs_fill
+ && fragP->fr_fix == 0);
+ fragP = fragP->fr_next)
+ ;
+ return fragP;
+}
+
+/* Return the next jcc frag after BRANCH_PADDING. */
+
+static fragS *
+i386_next_jcc_frag (fragS *fragP)
+{
+ if (!fragP)
+ return NULL;
+
+ if (fragP->fr_type == rs_machine_dependent
+ && (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ == BRANCH_PADDING))
+ {
+ fragP = i386_next_non_empty_frag (fragP);
+ if (fragP->fr_type != rs_machine_dependent)
+ return NULL;
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == COND_JUMP)
+ return fragP;
+ }
+
+ return NULL;
+}
+
+/* Classify BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags. */
+
+static void
+i386_classify_machine_dependent_frag (fragS *fragP)
+{
+ fragS *cmp_fragP;
+ fragS *pad_fragP;
+ fragS *branch_fragP;
+ fragS *next_fragP;
+ unsigned int max_prefix_length;
+
+ if (fragP->tc_frag_data.classified)
+ return;
+
+ /* First scan for BRANCH_PADDING and FUSED_JCC_PADDING. Convert
+ FUSED_JCC_PADDING and merge BRANCH_PADDING. */
+ for (next_fragP = fragP;
+ next_fragP != NULL;
+ next_fragP = next_fragP->fr_next)
+ {
+ next_fragP->tc_frag_data.classified = 1;
+ if (next_fragP->fr_type == rs_machine_dependent)
+ switch (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype))
+ {
+ case BRANCH_PADDING:
+ /* The BRANCH_PADDING frag must be followed by a branch
+ frag. */
+ branch_fragP = i386_next_non_empty_frag (next_fragP);
+ next_fragP->tc_frag_data.u.branch_fragP = branch_fragP;
+ break;
+ case FUSED_JCC_PADDING:
+ /* Check if this is a fused jcc:
+ FUSED_JCC_PADDING
+ CMP like instruction
+ BRANCH_PADDING
+ COND_JUMP
+ */
+ cmp_fragP = i386_next_non_empty_frag (next_fragP);
+ pad_fragP = i386_next_non_empty_frag (cmp_fragP);
+ branch_fragP = i386_next_jcc_frag (pad_fragP);
+ if (branch_fragP)
+ {
+ /* The BRANCH_PADDING frag is merged with the
+ FUSED_JCC_PADDING frag. */
+ next_fragP->tc_frag_data.u.branch_fragP = branch_fragP;
+ /* CMP like instruction size. */
+ next_fragP->tc_frag_data.cmp_size = cmp_fragP->fr_fix;
+ frag_wane (pad_fragP);
+ /* Skip to branch_fragP. */
+ next_fragP = branch_fragP;
+ }
+ else if (next_fragP->tc_frag_data.max_prefix_length)
+ {
+ /* Turn FUSED_JCC_PADDING into BRANCH_PREFIX if it isn't
+ a fused jcc. */
+ next_fragP->fr_subtype
+ = ENCODE_RELAX_STATE (BRANCH_PREFIX, 0);
+ next_fragP->tc_frag_data.max_bytes
+ = next_fragP->tc_frag_data.max_prefix_length;
+ /* This will be updated in the BRANCH_PREFIX scan. */
+ next_fragP->tc_frag_data.max_prefix_length = 0;
+ }
+ else
+ frag_wane (next_fragP);
+ break;
+ }
+ }
+
+ /* Stop if there is no BRANCH_PREFIX. */
+ if (!align_branch_prefix_size)
+ return;
+
+ /* Scan for BRANCH_PREFIX. */
+ for (; fragP != NULL; fragP = fragP->fr_next)
+ {
+ if (fragP->fr_type != rs_machine_dependent
+ || (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ != BRANCH_PREFIX))
+ continue;
+
+ /* Count all BRANCH_PREFIX frags before BRANCH_PADDING and
+ COND_JUMP_PREFIX. */
+ max_prefix_length = 0;
+ for (next_fragP = fragP;
+ next_fragP != NULL;
+ next_fragP = next_fragP->fr_next)
+ {
+ if (next_fragP->fr_type == rs_fill)
+ /* Skip rs_fill frags. */
+ continue;
+ else if (next_fragP->fr_type != rs_machine_dependent)
+ /* Stop for all other frags. */
+ break;
+
+ /* rs_machine_dependent frags. */
+ if (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)
+ == BRANCH_PREFIX)
+ {
+ /* Count BRANCH_PREFIX frags. */
+ if (max_prefix_length >= MAX_FUSED_JCC_PADDING_SIZE)
+ {
+ max_prefix_length = MAX_FUSED_JCC_PADDING_SIZE;
+ frag_wane (next_fragP);
+ }
+ else
+ max_prefix_length
+ += next_fragP->tc_frag_data.max_bytes;
+ }
+ else if ((TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)
+ == BRANCH_PADDING)
+ || (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)
+ == FUSED_JCC_PADDING))
+ {
+ /* Stop at BRANCH_PADDING and FUSED_JCC_PADDING. */
+ fragP->tc_frag_data.u.padding_fragP = next_fragP;
+ break;
+ }
+ else
+ /* Stop for other rs_machine_dependent frags. */
+ break;
+ }
+
+ fragP->tc_frag_data.max_prefix_length = max_prefix_length;
+
+ /* Skip to the next frag. */
+ fragP = next_fragP;
+ }
+}
+
+/* Compute padding size for
+
+ FUSED_JCC_PADDING
+ CMP like instruction
+ BRANCH_PADDING
+ COND_JUMP/UNCOND_JUMP
+
+ or
+
+ BRANCH_PADDING
+ COND_JUMP/UNCOND_JUMP
+ */
+
+static int
+i386_branch_padding_size (fragS *fragP, offsetT address)
+{
+ unsigned int offset, size, padding_size;
+ fragS *branch_fragP = fragP->tc_frag_data.u.branch_fragP;
+
+ /* The start address of the BRANCH_PADDING or FUSED_JCC_PADDING frag. */
+ if (!address)
+ address = fragP->fr_address;
+ address += fragP->fr_fix;
+
+ /* CMP like instrunction size. */
+ size = fragP->tc_frag_data.cmp_size;
+
+ /* The base size of the branch frag. */
+ size += branch_fragP->fr_fix;
+
+ /* Add opcode and displacement bytes for the rs_machine_dependent
+ branch frag. */
+ if (branch_fragP->fr_type == rs_machine_dependent)
+ size += md_relax_table[branch_fragP->fr_subtype].rlx_length;
+
+ /* Check if branch is within boundary and doesn't end at the last
+ byte. */
+ offset = address & ((1U << align_branch_power) - 1);
+ if ((offset + size) >= (1U << align_branch_power))
+ /* Padding needed to avoid crossing boundary. */
+ padding_size = (1U << align_branch_power) - offset;
+ else
+ /* No padding needed. */
+ padding_size = 0;
+
+ /* The return value may be saved in tc_frag_data.length which is
+ unsigned byte. */
+ if (!fits_in_unsigned_byte (padding_size))
+ abort ();
+
+ return padding_size;
+}
+
+/* i386_generic_table_relax_frag()
+
+ Handle BRANCH_PADDING, BRANCH_PREFIX and FUSED_JCC_PADDING frags to
+ grow/shrink padding to align branch frags. Hand others to
+ relax_frag(). */
+
+long
+i386_generic_table_relax_frag (segT segment, fragS *fragP, long stretch)
+{
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING)
+ {
+ long padding_size = i386_branch_padding_size (fragP, 0);
+ long grow = padding_size - fragP->tc_frag_data.length;
+
+ /* When the BRANCH_PREFIX frag is used, the computed address
+ must match the actual address and there should be no padding. */
+ if (fragP->tc_frag_data.padding_address
+ && (fragP->tc_frag_data.padding_address != fragP->fr_address
+ || padding_size))
+ abort ();
+
+ /* Update the padding size. */
+ if (grow)
+ fragP->tc_frag_data.length = padding_size;
+
+ return grow;
+ }
+ else if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX)
+ {
+ fragS *padding_fragP, *next_fragP;
+ long padding_size, left_size, last_size;
+
+ padding_fragP = fragP->tc_frag_data.u.padding_fragP;
+ if (!padding_fragP)
+ /* Use the padding set by the leading BRANCH_PREFIX frag. */
+ return (fragP->tc_frag_data.length
+ - fragP->tc_frag_data.last_length);
+
+ /* Compute the relative address of the padding frag in the very
+ first time where the BRANCH_PREFIX frag sizes are zero. */
+ if (!fragP->tc_frag_data.padding_address)
+ fragP->tc_frag_data.padding_address
+ = padding_fragP->fr_address - (fragP->fr_address - stretch);
+
+ /* First update the last length from the previous interation. */
+ left_size = fragP->tc_frag_data.prefix_length;
+ for (next_fragP = fragP;
+ next_fragP != padding_fragP;
+ next_fragP = next_fragP->fr_next)
+ if (next_fragP->fr_type == rs_machine_dependent
+ && (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)
+ == BRANCH_PREFIX))
+ {
+ if (left_size)
+ {
+ int max = next_fragP->tc_frag_data.max_bytes;
+ if (max)
+ {
+ int size;
+ if (max > left_size)
+ size = left_size;
+ else
+ size = max;
+ left_size -= size;
+ next_fragP->tc_frag_data.last_length = size;
+ }
+ }
+ else
+ next_fragP->tc_frag_data.last_length = 0;
+ }
+
+ /* Check the padding size for the padding frag. */
+ padding_size = i386_branch_padding_size
+ (padding_fragP, (fragP->fr_address
+ + fragP->tc_frag_data.padding_address));
+
+ last_size = fragP->tc_frag_data.prefix_length;
+ /* Check if there is change from the last interation. */
+ if (padding_size == last_size)
+ {
+ /* Update the expected address of the padding frag. */
+ padding_fragP->tc_frag_data.padding_address
+ = (fragP->fr_address + padding_size
+ + fragP->tc_frag_data.padding_address);
+ return 0;
+ }
+
+ if (padding_size > fragP->tc_frag_data.max_prefix_length)
+ {
+ /* No padding if there is no sufficient room. Clear the
+ expected address of the padding frag. */
+ padding_fragP->tc_frag_data.padding_address = 0;
+ padding_size = 0;
+ }
+ else
+ /* Store the expected address of the padding frag. */
+ padding_fragP->tc_frag_data.padding_address
+ = (fragP->fr_address + padding_size
+ + fragP->tc_frag_data.padding_address);
+
+ fragP->tc_frag_data.prefix_length = padding_size;
+
+ /* Update the length for the current interation. */
+ left_size = padding_size;
+ for (next_fragP = fragP;
+ next_fragP != padding_fragP;
+ next_fragP = next_fragP->fr_next)
+ if (next_fragP->fr_type == rs_machine_dependent
+ && (TYPE_FROM_RELAX_STATE (next_fragP->fr_subtype)
+ == BRANCH_PREFIX))
+ {
+ if (left_size)
+ {
+ int max = next_fragP->tc_frag_data.max_bytes;
+ if (max)
+ {
+ int size;
+ if (max > left_size)
+ size = left_size;
+ else
+ size = max;
+ left_size -= size;
+ next_fragP->tc_frag_data.length = size;
+ }
+ }
+ else
+ next_fragP->tc_frag_data.length = 0;
+ }
+
+ return (fragP->tc_frag_data.length
+ - fragP->tc_frag_data.last_length);
+ }
+ return relax_frag (segment, fragP, stretch);
+}
+
/* md_estimate_size_before_relax()
Called just before relax() for rs_machine_dependent frags. The x86
@@ -10381,6 +11198,14 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol, offsetT fr_var)
int
md_estimate_size_before_relax (fragS *fragP, segT segment)
{
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING)
+ {
+ i386_classify_machine_dependent_frag (fragP);
+ return fragP->tc_frag_data.length;
+ }
+
/* We've already got fragP->fr_subtype right; all we have to do is
check for un-relaxable symbols. On an ELF system, we can't relax
an externally visible symbol, because it may be overridden by a
@@ -10514,6 +11339,106 @@ md_convert_frag (bfd *abfd ATTRIBUTE_UNUSED, segT sec ATTRIBUTE_UNUSED,
unsigned int extension = 0;
offsetT displacement_from_opcode_start;
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PADDING
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == FUSED_JCC_PADDING
+ || TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX)
+ {
+ /* Generate nop padding. */
+ unsigned int size = fragP->tc_frag_data.length;
+ if (size)
+ {
+ if (size > fragP->tc_frag_data.max_bytes)
+ abort ();
+
+ if (flag_debug)
+ {
+ const char *msg;
+ const char *branch = "branch";
+ const char *prefix = "";
+ fragS *padding_fragP;
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype)
+ == BRANCH_PREFIX)
+ {
+ padding_fragP = fragP->tc_frag_data.u.padding_fragP;
+ switch (fragP->tc_frag_data.default_prefix)
+ {
+ default:
+ abort ();
+ break;
+ case CS_PREFIX_OPCODE:
+ prefix = " cs";
+ break;
+ case DS_PREFIX_OPCODE:
+ prefix = " ds";
+ break;
+ case ES_PREFIX_OPCODE:
+ prefix = " es";
+ break;
+ case FS_PREFIX_OPCODE:
+ prefix = " fs";
+ break;
+ case GS_PREFIX_OPCODE:
+ prefix = " gs";
+ break;
+ case SS_PREFIX_OPCODE:
+ prefix = " ss";
+ break;
+ }
+ if (padding_fragP)
+ msg = _("%s:%u: add %d%s at 0x%llx to align "
+ "%s within %d-byte boundary\n");
+ else
+ msg = _("%s:%u: add additional %d%s at 0x%llx to "
+ "align %s within %d-byte boundary\n");
+ }
+ else
+ {
+ padding_fragP = fragP;
+ msg = _("%s:%u: add %d%s-byte nop at 0x%llx to align "
+ "%s within %d-byte boundary\n");
+ }
+
+ if (padding_fragP)
+ switch (padding_fragP->tc_frag_data.branch_type)
+ {
+ case align_branch_jcc:
+ branch = "jcc";
+ break;
+ case align_branch_fused:
+ branch = "fused jcc";
+ break;
+ case align_branch_jmp:
+ branch = "jmp";
+ break;
+ case align_branch_call:
+ branch = "call";
+ break;
+ case align_branch_indirect:
+ branch = "indiret branch";
+ break;
+ case align_branch_ret:
+ branch = "ret";
+ break;
+ default:
+ break;
+ }
+
+ fprintf (stdout, msg,
+ fragP->fr_file, fragP->fr_line, size, prefix,
+ (long long) fragP->fr_address, branch,
+ 1 << align_branch_power);
+ }
+ if (TYPE_FROM_RELAX_STATE (fragP->fr_subtype) == BRANCH_PREFIX)
+ memset (fragP->fr_opcode,
+ fragP->tc_frag_data.default_prefix, size);
+ else
+ i386_generate_nops (fragP, (char *) fragP->fr_opcode,
+ size, 0);
+ fragP->fr_fix += size;
+ }
+ return;
+ }
+
opcode = (unsigned char *) fragP->fr_opcode;
/* Address we want to reach in file space. */
@@ -11072,6 +11997,9 @@ const char *md_shortopts = "qnO::";
#define OPTION_MFENCE_AS_LOCK_ADD (OPTION_MD_BASE + 24)
#define OPTION_X86_USED_NOTE (OPTION_MD_BASE + 25)
#define OPTION_MVEXWIG (OPTION_MD_BASE + 26)
+#define OPTION_MALIGN_BRANCH_BOUNDARY (OPTION_MD_BASE + 27)
+#define OPTION_MALIGN_BRANCH_PREFIX_SIZE (OPTION_MD_BASE + 28)
+#define OPTION_MALIGN_BRANCH (OPTION_MD_BASE + 29)
struct option md_longopts[] =
{
@@ -11107,6 +12035,9 @@ struct option md_longopts[] =
{"mfence-as-lock-add", required_argument, NULL, OPTION_MFENCE_AS_LOCK_ADD},
{"mrelax-relocations", required_argument, NULL, OPTION_MRELAX_RELOCATIONS},
{"mevexrcig", required_argument, NULL, OPTION_MEVEXRCIG},
+ {"malign-branch-boundary", required_argument, NULL, OPTION_MALIGN_BRANCH_BOUNDARY},
+ {"malign-branch-prefix-size", required_argument, NULL, OPTION_MALIGN_BRANCH_PREFIX_SIZE},
+ {"malign-branch", required_argument, NULL, OPTION_MALIGN_BRANCH},
{"mamd64", no_argument, NULL, OPTION_MAMD64},
{"mintel64", no_argument, NULL, OPTION_MINTEL64},
{NULL, no_argument, NULL, 0}
@@ -11117,7 +12048,7 @@ int
md_parse_option (int c, const char *arg)
{
unsigned int j;
- char *arch, *next, *saved;
+ char *arch, *next, *saved, *type;
switch (c)
{
@@ -11495,6 +12426,80 @@ md_parse_option (int c, const char *arg)
as_fatal (_("invalid -mrelax-relocations= option: `%s'"), arg);
break;
+ case OPTION_MALIGN_BRANCH_BOUNDARY:
+ {
+ char *end;
+ long int align = strtoul (arg, &end, 0);
+ if (*end == '\0')
+ {
+ if (align == 0)
+ {
+ align_branch_power = 0;
+ break;
+ }
+ else if (align >= 16)
+ {
+ int align_power;
+ for (align_power = 0;
+ (align & 1) == 0;
+ align >>= 1, align_power++)
+ continue;
+ /* Limit alignment power to 31. */
+ if (align == 1 && align_power < 32)
+ {
+ align_branch_power = align_power;
+ break;
+ }
+ }
+ }
+ as_fatal (_("invalid -malign-branch-boundary= value: %s"), arg);
+ }
+ break;
+
+ case OPTION_MALIGN_BRANCH_PREFIX_SIZE:
+ {
+ char *end;
+ int align = strtoul (arg, &end, 0);
+ /* Some processors only support 5 prefixes. */
+ if (*end == '\0' && align >= 0 && align < 6)
+ {
+ align_branch_prefix_size = align;
+ break;
+ }
+ as_fatal (_("invalid -malign-branch-prefix-size= value: %s"),
+ arg);
+ }
+ break;
+
+ case OPTION_MALIGN_BRANCH:
+ align_branch = 0;
+ saved = xstrdup (arg);
+ type = saved;
+ do
+ {
+ next = strchr (type, '+');
+ if (next)
+ *next++ = '\0';
+ if (strcasecmp (type, "jcc") == 0)
+ align_branch |= align_branch_jcc_bit;
+ else if (strcasecmp (type, "fused") == 0)
+ align_branch |= align_branch_fused_bit;
+ else if (strcasecmp (type, "jmp") == 0)
+ align_branch |= align_branch_jmp_bit;
+ else if (strcasecmp (type, "call") == 0)
+ align_branch |= align_branch_call_bit;
+ else if (strcasecmp (type, "ret") == 0)
+ align_branch |= align_branch_ret_bit;
+ else if (strcasecmp (type, "indirect") == 0)
+ align_branch |= align_branch_indirect_bit;
+ else
+ as_fatal (_("invalid -malign-branch= option: `%s'"), arg);
+ type = next;
+ }
+ while (next != NULL);
+ free (saved);
+ break;
+
case OPTION_MAMD64:
intel64 = 0;
break;
@@ -11747,6 +12752,17 @@ md_show_usage (FILE *stream)
fprintf (stream, _("\
generate relax relocations\n"));
fprintf (stream, _("\
+ -malign-branch-boundary=NUM (default: 0)\n\
+ align branches within NUM byte boundary\n"));
+ fprintf (stream, _("\
+ -malign-branch=TYPE[+TYPE...] (default: jcc+fused+jmp)\n\
+ TYPE is combination of jcc, fused, jmp, call, ret,\n\
+ indirect\n\
+ specify types of branches to align\n"));
+ fprintf (stream, _("\
+ -malign-branch-prefix-size=NUM (default: 5)\n\
+ align branches with NUM prefixes per instruction\n"));
+ fprintf (stream, _("\
-mamd64 accept only AMD64 ISA [default]\n"));
fprintf (stream, _("\
-mintel64 accept only Intel64 ISA\n"));
@@ -11830,15 +12846,24 @@ i386_target_format (void)
{
default:
format = ELF_TARGET_FORMAT;
+#ifndef TE_SOLARIS
+ tls_get_addr = "___tls_get_addr";
+#endif
break;
case X86_64_ABI:
use_rela_relocations = 1;
object_64bit = 1;
+#ifndef TE_SOLARIS
+ tls_get_addr = "__tls_get_addr";
+#endif
format = ELF_TARGET_FORMAT64;
break;
case X86_64_X32_ABI:
use_rela_relocations = 1;
object_64bit = 1;
+#ifndef TE_SOLARIS
+ tls_get_addr = "__tls_get_addr";
+#endif
disallow_64bit_reloc = 1;
format = ELF_TARGET_FORMAT32;
break;
@@ -11955,6 +12980,21 @@ s_bss (int ignore ATTRIBUTE_UNUSED)
#endif
+/* Remember constant directive. */
+
+void
+i386_cons_align (int ignore ATTRIBUTE_UNUSED)
+{
+ if (last_insn.kind != last_insn_directive
+ && (bfd_section_flags (now_seg) & SEC_CODE))
+ {
+ last_insn.seg = now_seg;
+ last_insn.kind = last_insn_directive;
+ last_insn.name = "constant directive";
+ last_insn.file = as_where (&last_insn.line);
+ }
+}
+
void
i386_validate_fix (fixS *fixp)
{
diff --git a/gas/config/tc-i386.h b/gas/config/tc-i386.h
index b02a256..28f03cf 100644
--- a/gas/config/tc-i386.h
+++ b/gas/config/tc-i386.h
@@ -210,12 +210,19 @@ if ((n) \
#define MAX_MEM_FOR_RS_ALIGN_CODE (alignment ? ((1 << alignment) - 1) : 1)
+extern void i386_cons_align (int);
+#define md_cons_align(nbytes) i386_cons_align (nbytes)
+
void i386_print_statistics (FILE *);
#define tc_print_statistics i386_print_statistics
extern unsigned int i386_frag_max_var (fragS *);
#define md_frag_max_var i386_frag_max_var
+extern long i386_generic_table_relax_frag (segT, fragS *, long);
+#define md_generic_table_relax_frag(segment, fragP, stretch) \
+ i386_generic_table_relax_frag (segment, fragP, stretch)
+
#define md_number_to_chars number_to_chars_littleendian
enum processor_type
@@ -250,10 +257,24 @@ extern i386_cpu_flags cpu_arch_isa_flags;
struct i386_tc_frag_data
{
+ union
+ {
+ fragS *padding_fragP;
+ fragS *branch_fragP;
+ } u;
+ addressT padding_address;
enum processor_type isa;
i386_cpu_flags isa_flags;
enum processor_type tune;
unsigned int max_bytes;
+ unsigned char length;
+ unsigned char last_length;
+ unsigned char max_prefix_length;
+ unsigned char prefix_length;
+ unsigned char default_prefix;
+ unsigned char cmp_size;
+ unsigned int classified : 1;
+ unsigned int branch_type : 3;
};
/* We need to emit the right NOP pattern in .align frags. This is
@@ -264,10 +285,20 @@ struct i386_tc_frag_data
#define TC_FRAG_INIT(FRAGP, MAX_BYTES) \
do \
{ \
+ (FRAGP)->tc_frag_data.u.padding_fragP = NULL; \
+ (FRAGP)->tc_frag_data.padding_address = 0; \
(FRAGP)->tc_frag_data.isa = cpu_arch_isa; \
(FRAGP)->tc_frag_data.isa_flags = cpu_arch_isa_flags; \
(FRAGP)->tc_frag_data.tune = cpu_arch_tune; \
(FRAGP)->tc_frag_data.max_bytes = (MAX_BYTES); \
+ (FRAGP)->tc_frag_data.length = 0; \
+ (FRAGP)->tc_frag_data.last_length = 0; \
+ (FRAGP)->tc_frag_data.max_prefix_length = 0; \
+ (FRAGP)->tc_frag_data.prefix_length = 0; \
+ (FRAGP)->tc_frag_data.default_prefix = 0; \
+ (FRAGP)->tc_frag_data.cmp_size = 0; \
+ (FRAGP)->tc_frag_data.classified = 0; \
+ (FRAGP)->tc_frag_data.branch_type = 0; \
} \
while (0)
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 589b426..74296e6 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -423,6 +423,32 @@ R_X86_64_REX_GOTPCRELX, in 64-bit mode.
relocations. The default can be controlled by a configure option
@option{--enable-x86-relax-relocations}.
+@cindex @samp{-malign-branch-boundary=} option, i386
+@cindex @samp{-malign-branch-boundary=} option, x86-64
+@item -malign-branch-boundary=@var{NUM}
+This option controls how the assembler should align branches with segment
+prefixes or NOP. @var{NUM} must be a power of 2. It should be 0 or
+no less than 16. Branches will be aligned within @var{NUM} byte
+boundary. @option{-malign-branch-boundary=0}, which is the default,
+doesn't align branches.
+
+@cindex @samp{-malign-branch=} option, i386
+@cindex @samp{-malign-branch=} option, x86-64
+@item -malign-branch=@var{TYPE}[+@var{TYPE}...]
+This option specifies types of branches to align. @var{TYPE} is
+combination of @samp{jcc}, which aligns conditional jumps,
+@samp{fused}, which aligns fused conditional jumps, @samp{jmp},
+which aligns unconditional jumps, @samp{call} which aligns calls,
+@samp{ret}, which aligns rets, @samp{indirect}, which aligns indirect
+jumps and calls. The default is @option{-malign-branch=jcc+fused+jmp}.
+
+@cindex @samp{-malign-branch-prefix-size=} option, i386
+@cindex @samp{-malign-branch-prefix-size=} option, x86-64
+@item -malign-branch-prefix-size=@var{NUM}
+This option specifies the maximum number of prefixes on an instruction
+to align branches. @var{NUM} should be between 0 and 5. The default
+@var{NUM} is 5.
+
@cindex @samp{-mx86-used-note=} option, i386
@cindex @samp{-mx86-used-note=} option, x86-64
@item -mx86-used-note=@var{no}