1 files changed, 76 insertions, 0 deletions
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa73f82..5768f08 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -338,6 +338,7 @@ Objective-C and Objective-C++ Dialects}.
 -fira-coalesce -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
+-floop-block -floop-interchange -floop-strip-mine @gol
 -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol
 -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol
 -fmudflapir -fmudflapth -fno-branch-count-reg -fno-default-inline @gol
@@ -6012,6 +6013,81 @@ at @option{-O} and higher.
 Perform linear loop transformations on tree.  This flag can improve cache
 performance and allow further loop optimizations to take place.
 
+@item -floop-interchange
+Perform loop interchange transformations on loops.  Interchanging two
+nested loops switches the inner and outer loops.  For example, given a
+loop like:
+@smallexample
+DO J = 1, M
+  DO I = 1, N
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+loop interchange will transform the loop as if the user had written:
+@smallexample
+DO I = 1, N
+  DO J = 1, M
+    A(J, I) = A(J, I) * C
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{N} is larger than the caches,
+because in Fortran, the elements of an array are stored in memory
+contiguously by column, and the original loop iterates over rows,
+potentially creating at each access a cache miss.  This optimization
+applies to all the languages supported by GCC and is not limited to
+Fortran.
+
+@item -floop-strip-mine
+Perform loop strip mining transformations on loops.  Strip mining
+splits a loop into two nested loops.  The outer loop has strides 
+equal to the strip size and the inner loop has strides of the 
+original loop within a strip.  For example, given a loop like:
+@smallexample
+DO I = 1, N
+  A(I) = A(I) + C
+ENDDO
+@end smallexample
+loop strip mining will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 4
+  DO I = II, min (II + 4, N)
+    A(I) = A(I) + C
+  ENDDO
+ENDDO
+@end smallexample
+This optimization applies to all the languages supported by GCC and is
+not limited to Fortran.
+
+@item -floop-block
+Perform loop blocking transformations on loops.  Blocking strip mines
+each loop in the loop nest such that the memory accesses of the
+element loops fit inside caches.  For example, given a loop like:
+@smallexample
+DO I = 1, N
+  DO J = 1, M
+    A(J, I) = B(I) + C(J)
+  ENDDO
+ENDDO
+@end smallexample
+loop blocking will transform the loop as if the user had written:
+@smallexample
+DO II = 1, N, 64
+  DO JJ = 1, M, 64
+    DO I = II, min (II + 64, N)
+      DO J = JJ, min (JJ + 64, M)
+        A(J, I) = B(I) + C(J)
+      ENDDO
+    ENDDO
+  ENDDO
+ENDDO
+@end smallexample
+which can be beneficial when @code{M} is larger than the caches,
+because the innermost loop will iterate over a smaller amount of data
+that can be kept in the caches.  This optimization applies to all the
+languages supported by GCC and is not limited to Fortran.
+
 @item -fcheck-data-deps
 @opindex fcheck-data-deps
 Compare the results of several data dependence analyzers.  This option