From f3a3d0d39fab02f3b8d2206a5f46e5eb2f8e9b05 Mon Sep 17 00:00:00 2001 From: Richard Henderson Date: Sat, 2 Oct 1999 11:07:49 -0700 Subject: * md.texi (define_peephole2): New section. From-SVN: r29772 --- gcc/ChangeLog | 4 + gcc/md.texi | 527 +++++++++++++++++++++++++++++++++++----------------------- 2 files changed, 319 insertions(+), 212 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 901ac95..9af4018 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +Sat Oct 2 11:06:31 1999 Richard Henderson + + * md.texi (define_peephole2): New section. + Sat Oct 2 10:57:56 1999 Jan Hubicka * i386.md (mov?i patterns): Fix handling of TARGET_USE_MOV0 diff --git a/gcc/md.texi b/gcc/md.texi index bb7aeac..8242ea0 100644 --- a/gcc/md.texi +++ b/gcc/md.texi @@ -32,10 +32,10 @@ See the next chapter for information on the C header file. * Dependent Patterns:: Having one pattern may make you need another. * Jump Patterns:: Special considerations for patterns for jump insns. * Insn Canonicalizations::Canonicalization of Instructions -* Peephole Definitions::Defining machine-specific peephole optimizations. * Expander Definitions::Generating a sequence of several RTL insns - for a standard operation. -* Insn Splitting:: Splitting Instructions into Multiple Instructions + for a standard operation. +* Insn Splitting:: Splitting Instructions into Multiple Instructions. +* Peephole Definitions::Defining machine-specific peephole optimizations. * Insn Attributes:: Specifying the value of attributes for generated insns. @end menu @@ -2907,210 +2907,6 @@ will be written using @code{zero_extract} rather than the equivalent @end itemize -@node Peephole Definitions -@section Machine-Specific Peephole Optimizers -@cindex peephole optimizer definitions -@cindex defining peephole optimizers - -In addition to instruction patterns the @file{md} file may contain -definitions of machine-specific peephole optimizations. - -The combiner does not notice certain peephole optimizations when the data -flow in the program does not suggest that it should try them. For example, -sometimes two consecutive insns related in purpose can be combined even -though the second one does not appear to use a register computed in the -first one. A machine-specific peephole optimizer can detect such -opportunities. - -@need 1000 -A definition looks like this: - -@smallexample -(define_peephole - [@var{insn-pattern-1} - @var{insn-pattern-2} - @dots{}] - "@var{condition}" - "@var{template}" - "@var{optional insn-attributes}") -@end smallexample - -@noindent -The last string operand may be omitted if you are not using any -machine-specific information in this machine description. If present, -it must obey the same rules as in a @code{define_insn}. - -In this skeleton, @var{insn-pattern-1} and so on are patterns to match -consecutive insns. The optimization applies to a sequence of insns when -@var{insn-pattern-1} matches the first one, @var{insn-pattern-2} matches -the next, and so on.@refill - -Each of the insns matched by a peephole must also match a -@code{define_insn}. Peepholes are checked only at the last stage just -before code generation, and only optionally. Therefore, any insn which -would match a peephole but no @code{define_insn} will cause a crash in code -generation in an unoptimized compilation, or at various optimization -stages. - -The operands of the insns are matched with @code{match_operands}, -@code{match_operator}, and @code{match_dup}, as usual. What is not -usual is that the operand numbers apply to all the insn patterns in the -definition. So, you can check for identical operands in two insns by -using @code{match_operand} in one insn and @code{match_dup} in the -other. - -The operand constraints used in @code{match_operand} patterns do not have -any direct effect on the applicability of the peephole, but they will -be validated afterward, so make sure your constraints are general enough -to apply whenever the peephole matches. If the peephole matches -but the constraints are not satisfied, the compiler will crash. - -It is safe to omit constraints in all the operands of the peephole; or -you can write constraints which serve as a double-check on the criteria -previously tested. - -Once a sequence of insns matches the patterns, the @var{condition} is -checked. This is a C expression which makes the final decision whether to -perform the optimization (we do so if the expression is nonzero). If -@var{condition} is omitted (in other words, the string is empty) then the -optimization is applied to every sequence of insns that matches the -patterns. - -The defined peephole optimizations are applied after register allocation -is complete. Therefore, the peephole definition can check which -operands have ended up in which kinds of registers, just by looking at -the operands. - -@findex prev_active_insn -The way to refer to the operands in @var{condition} is to write -@code{operands[@var{i}]} for operand number @var{i} (as matched by -@code{(match_operand @var{i} @dots{})}). Use the variable @code{insn} -to refer to the last of the insns being matched; use -@code{prev_active_insn} to find the preceding insns. - -@findex dead_or_set_p -When optimizing computations with intermediate results, you can use -@var{condition} to match only when the intermediate results are not used -elsewhere. Use the C expression @code{dead_or_set_p (@var{insn}, -@var{op})}, where @var{insn} is the insn in which you expect the value -to be used for the last time (from the value of @code{insn}, together -with use of @code{prev_nonnote_insn}), and @var{op} is the intermediate -value (from @code{operands[@var{i}]}).@refill - -Applying the optimization means replacing the sequence of insns with one -new insn. The @var{template} controls ultimate output of assembler code -for this combined insn. It works exactly like the template of a -@code{define_insn}. Operand numbers in this template are the same ones -used in matching the original sequence of insns. - -The result of a defined peephole optimizer does not need to match any of -the insn patterns in the machine description; it does not even have an -opportunity to match them. The peephole optimizer definition itself serves -as the insn pattern to control how the insn is output. - -Defined peephole optimizers are run as assembler code is being output, -so the insns they produce are never combined or rearranged in any way. - -Here is an example, taken from the 68000 machine description: - -@smallexample -(define_peephole - [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4))) - (set (match_operand:DF 0 "register_operand" "=f") - (match_operand:DF 1 "register_operand" "ad"))] - "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])" - "* -@{ - rtx xoperands[2]; - xoperands[1] = gen_rtx (REG, SImode, REGNO (operands[1]) + 1); -#ifdef MOTOROLA - output_asm_insn (\"move.l %1,(sp)\", xoperands); - output_asm_insn (\"move.l %1,-(sp)\", operands); - return \"fmove.d (sp)+,%0\"; -#else - output_asm_insn (\"movel %1,sp@@\", xoperands); - output_asm_insn (\"movel %1,sp@@-\", operands); - return \"fmoved sp@@+,%0\"; -#endif -@} -") -@end smallexample - -@need 1000 -The effect of this optimization is to change - -@smallexample -@group -jbsr _foobar -addql #4,sp -movel d1,sp@@- -movel d0,sp@@- -fmoved sp@@+,fp0 -@end group -@end smallexample - -@noindent -into - -@smallexample -@group -jbsr _foobar -movel d1,sp@@ -movel d0,sp@@- -fmoved sp@@+,fp0 -@end group -@end smallexample - -@ignore -@findex CC_REVERSED -If a peephole matches a sequence including one or more jump insns, you must -take account of the flags such as @code{CC_REVERSED} which specify that the -condition codes are represented in an unusual manner. The compiler -automatically alters any ordinary conditional jumps which occur in such -situations, but the compiler cannot alter jumps which have been replaced by -peephole optimizations. So it is up to you to alter the assembler code -that the peephole produces. Supply C code to write the assembler output, -and in this C code check the condition code status flags and change the -assembler code as appropriate. -@end ignore - -@var{insn-pattern-1} and so on look @emph{almost} like the second -operand of @code{define_insn}. There is one important difference: the -second operand of @code{define_insn} consists of one or more RTX's -enclosed in square brackets. Usually, there is only one: then the same -action can be written as an element of a @code{define_peephole}. But -when there are multiple actions in a @code{define_insn}, they are -implicitly enclosed in a @code{parallel}. Then you must explicitly -write the @code{parallel}, and the square brackets within it, in the -@code{define_peephole}. Thus, if an insn pattern looks like this, - -@smallexample -(define_insn "divmodsi4" - [(set (match_operand:SI 0 "general_operand" "=d") - (div:SI (match_operand:SI 1 "general_operand" "0") - (match_operand:SI 2 "general_operand" "dmsK"))) - (set (match_operand:SI 3 "general_operand" "=d") - (mod:SI (match_dup 1) (match_dup 2)))] - "TARGET_68020" - "divsl%.l %2,%3:%0") -@end smallexample - -@noindent -then the way to mention this insn in a peephole is as follows: - -@smallexample -(define_peephole - [@dots{} - (parallel - [(set (match_operand:SI 0 "general_operand" "=d") - (div:SI (match_operand:SI 1 "general_operand" "0") - (match_operand:SI 2 "general_operand" "dmsK"))) - (set (match_operand:SI 3 "general_operand" "=d") - (mod:SI (match_dup 1) (match_dup 2)))]) - @dots{}] - @dots{}) -@end smallexample - @node Expander Definitions @section Defining RTL Sequences for Code Generation @cindex expander definitions @@ -3134,11 +2930,10 @@ A @code{define_expand} RTX has four operands: The name. Each @code{define_expand} must have a name, since the only use for it is to refer to it by name. -@findex define_peephole @item -The RTL template. This is just like the RTL template for a -@code{define_peephole} in that it is a vector of RTL expressions -each being one insn. +The RTL template. This is a vector of RTL expressions representing +a sequence of separate instructions. Unlike @code{define_insn}, there +is no implicit surrounding @code{PARALLEL}. @item The condition, a string containing a C expression. This expression is @@ -3333,7 +3128,7 @@ subexpression. However, in some other cases, such as performing an addition of a large constant in two insns on a RISC machine, the way to split the addition into two insns is machine-dependent. -@cindex define_split +@findex define_split The @code{define_split} definition tells the compiler how to split a complex insn into several simpler insns. It looks like this: @@ -3466,6 +3261,314 @@ insns that don't. Instead, write two separate @code{define_split} definitions, one for the insns that are valid and one for the insns that are not valid. +@node Peephole Definitions +@section Machine-Specific Peephole Optimizers +@cindex peephole optimizer definitions +@cindex defining peephole optimizers + +In addition to instruction patterns the @file{md} file may contain +definitions of machine-specific peephole optimizations. + +The combiner does not notice certain peephole optimizations when the data +flow in the program does not suggest that it should try them. For example, +sometimes two consecutive insns related in purpose can be combined even +though the second one does not appear to use a register computed in the +first one. A machine-specific peephole optimizer can detect such +opportunities. + +There are two forms of peephole definitions that may be used. The +original @code{define_peephole} is run at assembly output time to +match insns and substitute assembly text. Use of @code{define_peephole} +is deprecated. + +A newer @code{define_peephole2} matches insns and substitutes new +insns. The @code{peephole2} pass is run after register allocation +but before scheduling, which may result in much better code for +targets that do scheduling. + +@menu +* define_peephole:: RTL to Text Peephole Optimizers +* define_peephole2:: RTL to RTL Peephole Optimizers +@end menu + +@node define_peephole +@subsection RTL to Text Peephole Optimizers +@findex define_peephole + +@need 1000 +A definition looks like this: + +@smallexample +(define_peephole + [@var{insn-pattern-1} + @var{insn-pattern-2} + @dots{}] + "@var{condition}" + "@var{template}" + "@var{optional insn-attributes}") +@end smallexample + +@noindent +The last string operand may be omitted if you are not using any +machine-specific information in this machine description. If present, +it must obey the same rules as in a @code{define_insn}. + +In this skeleton, @var{insn-pattern-1} and so on are patterns to match +consecutive insns. The optimization applies to a sequence of insns when +@var{insn-pattern-1} matches the first one, @var{insn-pattern-2} matches +the next, and so on.@refill + +Each of the insns matched by a peephole must also match a +@code{define_insn}. Peepholes are checked only at the last stage just +before code generation, and only optionally. Therefore, any insn which +would match a peephole but no @code{define_insn} will cause a crash in code +generation in an unoptimized compilation, or at various optimization +stages. + +The operands of the insns are matched with @code{match_operands}, +@code{match_operator}, and @code{match_dup}, as usual. What is not +usual is that the operand numbers apply to all the insn patterns in the +definition. So, you can check for identical operands in two insns by +using @code{match_operand} in one insn and @code{match_dup} in the +other. + +The operand constraints used in @code{match_operand} patterns do not have +any direct effect on the applicability of the peephole, but they will +be validated afterward, so make sure your constraints are general enough +to apply whenever the peephole matches. If the peephole matches +but the constraints are not satisfied, the compiler will crash. + +It is safe to omit constraints in all the operands of the peephole; or +you can write constraints which serve as a double-check on the criteria +previously tested. + +Once a sequence of insns matches the patterns, the @var{condition} is +checked. This is a C expression which makes the final decision whether to +perform the optimization (we do so if the expression is nonzero). If +@var{condition} is omitted (in other words, the string is empty) then the +optimization is applied to every sequence of insns that matches the +patterns. + +The defined peephole optimizations are applied after register allocation +is complete. Therefore, the peephole definition can check which +operands have ended up in which kinds of registers, just by looking at +the operands. + +@findex prev_active_insn +The way to refer to the operands in @var{condition} is to write +@code{operands[@var{i}]} for operand number @var{i} (as matched by +@code{(match_operand @var{i} @dots{})}). Use the variable @code{insn} +to refer to the last of the insns being matched; use +@code{prev_active_insn} to find the preceding insns. + +@findex dead_or_set_p +When optimizing computations with intermediate results, you can use +@var{condition} to match only when the intermediate results are not used +elsewhere. Use the C expression @code{dead_or_set_p (@var{insn}, +@var{op})}, where @var{insn} is the insn in which you expect the value +to be used for the last time (from the value of @code{insn}, together +with use of @code{prev_nonnote_insn}), and @var{op} is the intermediate +value (from @code{operands[@var{i}]}).@refill + +Applying the optimization means replacing the sequence of insns with one +new insn. The @var{template} controls ultimate output of assembler code +for this combined insn. It works exactly like the template of a +@code{define_insn}. Operand numbers in this template are the same ones +used in matching the original sequence of insns. + +The result of a defined peephole optimizer does not need to match any of +the insn patterns in the machine description; it does not even have an +opportunity to match them. The peephole optimizer definition itself serves +as the insn pattern to control how the insn is output. + +Defined peephole optimizers are run as assembler code is being output, +so the insns they produce are never combined or rearranged in any way. + +Here is an example, taken from the 68000 machine description: + +@smallexample +(define_peephole + [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4))) + (set (match_operand:DF 0 "register_operand" "=f") + (match_operand:DF 1 "register_operand" "ad"))] + "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])" + "* +@{ + rtx xoperands[2]; + xoperands[1] = gen_rtx (REG, SImode, REGNO (operands[1]) + 1); +#ifdef MOTOROLA + output_asm_insn (\"move.l %1,(sp)\", xoperands); + output_asm_insn (\"move.l %1,-(sp)\", operands); + return \"fmove.d (sp)+,%0\"; +#else + output_asm_insn (\"movel %1,sp@@\", xoperands); + output_asm_insn (\"movel %1,sp@@-\", operands); + return \"fmoved sp@@+,%0\"; +#endif +@} +") +@end smallexample + +@need 1000 +The effect of this optimization is to change + +@smallexample +@group +jbsr _foobar +addql #4,sp +movel d1,sp@@- +movel d0,sp@@- +fmoved sp@@+,fp0 +@end group +@end smallexample + +@noindent +into + +@smallexample +@group +jbsr _foobar +movel d1,sp@@ +movel d0,sp@@- +fmoved sp@@+,fp0 +@end group +@end smallexample + +@ignore +@findex CC_REVERSED +If a peephole matches a sequence including one or more jump insns, you must +take account of the flags such as @code{CC_REVERSED} which specify that the +condition codes are represented in an unusual manner. The compiler +automatically alters any ordinary conditional jumps which occur in such +situations, but the compiler cannot alter jumps which have been replaced by +peephole optimizations. So it is up to you to alter the assembler code +that the peephole produces. Supply C code to write the assembler output, +and in this C code check the condition code status flags and change the +assembler code as appropriate. +@end ignore + +@var{insn-pattern-1} and so on look @emph{almost} like the second +operand of @code{define_insn}. There is one important difference: the +second operand of @code{define_insn} consists of one or more RTX's +enclosed in square brackets. Usually, there is only one: then the same +action can be written as an element of a @code{define_peephole}. But +when there are multiple actions in a @code{define_insn}, they are +implicitly enclosed in a @code{parallel}. Then you must explicitly +write the @code{parallel}, and the square brackets within it, in the +@code{define_peephole}. Thus, if an insn pattern looks like this, + +@smallexample +(define_insn "divmodsi4" + [(set (match_operand:SI 0 "general_operand" "=d") + (div:SI (match_operand:SI 1 "general_operand" "0") + (match_operand:SI 2 "general_operand" "dmsK"))) + (set (match_operand:SI 3 "general_operand" "=d") + (mod:SI (match_dup 1) (match_dup 2)))] + "TARGET_68020" + "divsl%.l %2,%3:%0") +@end smallexample + +@noindent +then the way to mention this insn in a peephole is as follows: + +@smallexample +(define_peephole + [@dots{} + (parallel + [(set (match_operand:SI 0 "general_operand" "=d") + (div:SI (match_operand:SI 1 "general_operand" "0") + (match_operand:SI 2 "general_operand" "dmsK"))) + (set (match_operand:SI 3 "general_operand" "=d") + (mod:SI (match_dup 1) (match_dup 2)))]) + @dots{}] + @dots{}) +@end smallexample + +@node define_peephole2 +@subsection RTL to RTL Peephole Optimizers +@findex define_peephole2 + +The @code{define_peephole2} definition tells the compiler how to +substitute one sequence of instructions for another sequence, +what additional scratch registers may be needed and what their +lifetimes must be. + +@smallexample +(define_peephole2 + [@var{insn-pattern-1} + @var{insn-pattern-2} + @dots{}] + "@var{condition}" + [@var{new-insn-pattern-1} + @var{new-insn-pattern-2} + @dots{}] + "@var{preparation statements}") +@end smallexample + +The definition is almost identical to @code{define_split} +(@pxref{Insn Splitting}) except that the pattern to match is not a +single instruction, but a sequence of instructions. + +It is possible to request additional scratch registers for use in the +output template. If appropriate registers are not free, the pattern +will simply not match. + +@findex match_scratch +@findex match_dup +Scratch registers are requested with a @code{match_scratch} pattern at +the top level of the input pattern. The allocated register (initially) will +be dead at the point requested within the original sequence. If the scratch +is used at more than a single point, a @code{match_dup} pattern at the +top level of the input pattern marks the last position in the input sequence +at which the register must be available. + +Here is an example from the IA-32 machine description: + +@smallexample +(define_peephole2 + [(match_scratch:SI 2 "r") + (parallel [(set (match_operand:SI 0 "register_operand" "") + (match_operator:SI 3 "arith_or_logical_operator" + [(match_dup 0) + (match_operand:SI 1 "memory_operand" "")])) + (clobber (reg:CC 17))])] + "! optimize_size && ! TARGET_READ_MODIFY" + [(set (match_dup 2) (match_dup 1)) + (parallel [(set (match_dup 0) + (match_op_dup 3 [(match_dup 0) (match_dup 2)])) + (clobber (reg:CC 17))])] + "") +@end smallexample + +@noindent +This pattern tries to split a load from its use in the hopes that we'll be +able to schedule around the memory load latency. It allocates a single +@code{SImode} register of class @code{GENERAL_REGS} (@code{"r"}) that needs +to be live only at the point just before the arithmetic. + +A real example requring extended scratch lifetimes is harder to come by, +so here's a silly made-up example: + +@smallexample +(define_peephole2 + [(match_scratch:SI 4 "r") + (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" "")) + (set (match_operand:SI 2 "" "") (match_dup 1)) + (match_dup 4) + (set (match_operand:SI 3 "" "") (match_dup 1))] + "@var{determine 1 does not overlap 0 and 2}" + [(set (match_dup 4) (match_dup 1)) + (set (match_dup 0) (match_dup 4)) + (set (match_dup 2) (match_dup 4))] + (set (match_dup 3) (match_dup 4))] + "") +@end smallexample + +@noindent +If we had not added the @code{(match_dup 3)} at the end of the sequence, +it might have been the case that the register we chose at the beginning +of the sequence is killed by the first or second @code{set}. + @node Insn Attributes @section Instruction Attributes @cindex insn attributes -- cgit v1.1