Compare commits

...

10 Commits

Author SHA1 Message Date
Bin Cheng
8ca74c42ca gimple-loop-interchange.cc (struct induction): Rename fields.
2017-12-05  Bin Cheng  <bin.cheng@arm.com>

	* gimple-loop-interchange.cc (struct induction): Rename fields.
	(dump_induction, loop_cand::analyze_induction_var): Update uses.
	(loop_cand::undo_simple_reduction): Ditto.
	(tree_loop_interchange::map_inductions_to_loop): Ditto.
	(tree_loop_interchange::can_interchange_loops): Delete.
	(tree_loop_interchange::interchange): Inline can_interchange_loops.

From-SVN: r255419
2017-12-05 13:40:08 +00:00
Richard Biener
ecf9b3e396 gimple-loop-interchange.cc (AVG_LOOP_NITER): Remove.
2017-12-05  Richard Biener  <rguenther@suse.de>

	* gimple-loop-interchange.cc (AVG_LOOP_NITER): Remove.
	(loop_cand::supported_operations): Simplify.
	(loop_cand::analyze_iloop_reduction_var): Use m_exit.
	(loop_cand::analyze_oloop_reduction_var): Likewise.
	(loop_cand::analyze_lcssa_phis): Likewise.
	(find_deps_in_bb_for_stmt): Use gimple_seq_add_stmt_without_update.
	(loop_cand::undo_simple_reduction): Likewise, properly release
	virtual defs.
	(tree_loop_interchange::interchange_loops): Likewise.  Move code
	to innner loop here.
	(tree_loop_interchange::map_inductions_to_loop): Remove code moving
	code to inner loop.
	(insert_pos_at_inner_loop): Inline into single caller...
	(tree_loop_interchange::move_code_to_inner): ...here.  Properly
	release virtual defs.
	(proper_loop_form_for_interchange): Properly analyze/instantiate SCEV.
	(prepare_perfect_loop_nest): Do not explicitely allocate vectors.

From-SVN: r255416
2017-12-05 13:20:21 +00:00
Richard Biener
24db14ad9c loop-interchange-12.c: New testcase.
2017-12-05  Richard Biener  <rguenther@suse.de>

	* gcc.dg/tree-ssa/loop-interchange-12.c: New testcase.
	* gcc.dg/tree-ssa/loop-interchange-13.c: Likewise.

From-SVN: r255405
2017-12-05 09:27:58 +00:00
Richard Biener
4c83982f27 gimple-loop-interchange.cc (loop_cand::classify_simple_reduction): Simplify.
2017-12-05  Richard Biener  <rguenther@suse.de>

	* gimple-loop-interchange.cc (loop_cand::classify_simple_reduction):
	Simplify.
	(loop_cand::analyze_iloop_reduction_var): Reject dead reductions.
	(loop_cand::analyze_oloop_reduction_var): Likewise.  Simplify.
	(tree_loop_interchange::interchange_loops): Properly analyze
	scalar evolution before instantiating a SCEV.

From-SVN: r255402
2017-12-05 08:06:08 +00:00
Richard Biener
be6a1ba281 tree-vectorizer.h (check_reduction_path): Declare.
2017-12-04  Richard Biener  <rguenther@suse.de>

	* tree-vectorizer.h (check_reduction_path): Declare.
	* tree-vect-loop.c (check_reduction_path): New function, split out
	from ...
	(vect_is_simple_reduction): ... here.
	* gimple-loop-interchange.cc: Include tree-vectorizer.h.
	(loop_cand::analyze_iloop_reduction_var): Use single_imm_use.
	Properly check for a supported reduction operation and a
	valid expression if the reduction covers multiple stmts.
	(prepare_perfect_loop_nest): Simpify allocation.
	(pass_linterchange::execute): Likewise.

	* gcc.dg/tree-ssa/loop-interchange-1.c: Add fast-math flags.
	* gcc.dg/tree-ssa/loop-interchange-1b.c: New test variant.
	* gcc.dg/tree-ssa/loop-interchange-4.c: XFAIL.

From-SVN: r255383
2017-12-04 15:08:56 +00:00
Bin Cheng
405a28c95d gimple-loop-interchange.cc (is-a.h): New header file.
2017-12-01  Bin Cheng  <bin.cheng@arm.com>

	* gimple-loop-interchange.cc (is-a.h): New header file.
	(loop_cand::find_reduction_by_stmt): Use dyn_cast instead of is_a<>
	and as_a<>.
	(loop_cand::analyze_iloop_reduction_var): Ditto.
	(loop_cand::analyze_oloop_reduction_var): Ditto.  Check gimple stmt
	against phi node directly.

From-SVN: r255310
2017-12-01 14:33:03 +00:00
Richard Biener
4c557c22f4 gimple-loop-interchange.cc (estimate_val_by_simplify_replace): Remove.
2017-12-01  Richard Biener  <rguenther@suse.de>

	* gimple-loop-interchange.cc (estimate_val_by_simplify_replace):
	Remove.
	(compute_access_stride): Rewrite using instantiate_scev,
	remove constant substitution.
	(should_interchange_loops): Adjust for non-constant strides.

From-SVN: r255306
2017-12-01 13:32:22 +00:00
Richard Biener
7d0fb9059a pr81303.f: New testcase.
2017-12-01  Richard Biener  <rguenther@suse.de>

	* gfortran.dg/pr81303.f: New testcase.

From-SVN: r255304
2017-12-01 11:59:52 +00:00
Bin Cheng
0100e9a980 Makefile.in (gimple-loop-interchange.o): New object file.
2017-11-28  Bin Cheng  <bin.cheng@arm.com>

	* Makefile.in (gimple-loop-interchange.o): New object file.
	* common.opt (floop-interchange): Reuse the option from graphite.
	* doc/invoke.texi (-floop-interchange): Ditto.  New document.
	* gimple-loop-interchange.cc: New file.
	* params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New parameter.
	(PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter.
	* passes.def (pass_linterchange): New pass.
	* timevar.def (TV_LINTERCHANGE): New time var.
	* tree-pass.h (make_pass_linterchange): New declaration.
	* tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external
	interchange.  Record IV before/after increment in new parameters.
	* tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration.

gcc/testsuite
2017-11-28  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/loop-interchange-1.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-2.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-3.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-4.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-5.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-6.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-7.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-8.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-9.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-10.c: New test.
	* gcc.dg/tree-ssa/loop-interchange-11.c: New test.

From-SVN: r255207
2017-11-28 15:49:10 +00:00
Bin Cheng
b00cc1176c tree-ssa-dce.c (simple_dce_from_worklist): Move and rename from tree-ssa-pre.c::remove_dead_inserted_code.
2017-11-28  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-dce.c (simple_dce_from_worklist): Move and rename from
	tree-ssa-pre.c::remove_dead_inserted_code.
	* tree-ssa-dce.h: New file.
	* tree-ssa-pre.c (tree-ssa-dce.h): Include new header file.
	(remove_dead_inserted_code): Move and rename to function
	tree-ssa-dce.c::simple_dce_from_worklist.
	(pass_pre::execute): Update use.

From-SVN: r255206
2017-11-28 15:39:57 +00:00
30 changed files with 3107 additions and 167 deletions

View File

@@ -1304,6 +1304,7 @@ OBJS = \
gimple-iterator.o \
gimple-fold.o \
gimple-laddress.o \
gimple-loop-interchange.o \
gimple-low.o \
gimple-pretty-print.o \
gimple-ssa-backprop.o \

View File

@@ -1504,8 +1504,8 @@ Common Alias(floop-nest-optimize)
Enable loop nest transforms. Same as -floop-nest-optimize.
floop-interchange
Common Alias(floop-nest-optimize)
Enable loop nest transforms. Same as -floop-nest-optimize.
Common Report Var(flag_loop_interchange) Optimization
Enable loop interchange on trees.
floop-block
Common Alias(floop-nest-optimize)

View File

@@ -8513,12 +8513,10 @@ Perform loop optimizations on trees. This flag is enabled by default
at @option{-O} and higher.
@item -ftree-loop-linear
@itemx -floop-interchange
@itemx -floop-strip-mine
@itemx -floop-block
@itemx -floop-unroll-and-jam
@opindex ftree-loop-linear
@opindex floop-interchange
@opindex floop-strip-mine
@opindex floop-block
@opindex floop-unroll-and-jam
@@ -8613,6 +8611,25 @@ ENDDO
@end smallexample
and the initialization loop is transformed into a call to memset zero.
@item -floop-interchange
@opindex floop-interchange
Perform loop interchange outside of graphite. This flag can improve cache
performance on loop nest and allow further loop optimizations, like
vectorization, to take place. For example, the loop
@smallexample
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
@end smallexample
is transformed to
@smallexample
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
@end smallexample
@item -ftree-loop-im
@opindex ftree-loop-im
Perform loop invariant motion on trees. This pass moves only invariants that

File diff suppressed because it is too large Load Diff

View File

@@ -790,6 +790,20 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
"The size of L2 cache.",
512, 0, 0)
/* Maximum number of statements in loop nest for loop interchange. */
DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
"loop-interchange-max-num-stmts",
"The maximum number of stmts in loop nest for loop interchange.",
64, 0, 0)
/* Minimum stride ratio for loop interchange to be profitiable. */
DEFPARAM (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO,
"loop-interchange-stride-ratio",
"The minimum stride ratio for loop interchange to be profitable",
2, 0, 0)
/* Whether we should use canonical types rather than deep "structural"
type checking. Setting this value to 1 (the default) improves
compilation performance in the C++ and Objective-C++ front end;

View File

@@ -278,6 +278,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_cd_dce);
NEXT_PASS (pass_iv_canon);
NEXT_PASS (pass_loop_distribution);
NEXT_PASS (pass_linterchange);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_graphite);
PUSH_INSERT_PASSES_WITHIN (pass_graphite)

View File

@@ -0,0 +1,50 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fassociative-math -fno-signed-zeros -fno-trapping-math -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-4.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
double u[1782225];
static int __attribute__((noinline))
foo (int N, int *res)
{
int i, j;
double sum = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum = sum + u[i + 1335 * j];
for (i = 0; i < N; i++)
u[1336 * i] *= 2;
*res = sum + N + u[1336 * 2] + u[1336];
}
extern void abort ();
int
main (void)
{
int i, j, res;
for (i = 0; i < 1782225; i++)
u[i] = 2;
foo (1335, &res);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 3565793)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange"} } */

View File

@@ -0,0 +1,43 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M];
int __attribute__((noinline))
double_reduc (int n)
{
int sum = 0;
for (int j = 0; j < n; j++)
{
for (int i = 0; i < n; i++)
sum = sum + a[i][j]*b[i][j];
}
return sum;
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
}
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
int sum = double_reduc (M);
if (sum != 1065369600)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange" } } */

View File

@@ -0,0 +1,22 @@
/* { dg-do compile } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M];
void
simple_reduc_1 (int n, int *p)
{
for (int j = 0; j < n; j++)
{
int sum = p[j];
for (int i = 0; i < n; i++)
{
sum = sum + b[i][j];
b[i][j] += a[i][j];
}
p[j] = sum;
}
}
/* { dg-final { scan-tree-dump-not "Loop_pair<outer:., inner:.> is interchanged" "linterchange" } } */

View File

@@ -0,0 +1,50 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-4.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
unsigned u[1024];
static void __attribute__((noinline,noclone,noipa))
foo (int N, unsigned *res)
{
int i, j;
unsigned sum = 1;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum = u[i + 2 * j] / sum;
*res = sum;
}
extern void abort ();
int
main (void)
{
int i, j;
unsigned res;
u[0] = 10;
u[1] = 200;
u[2] = 10;
u[3] = 10;
foo (2, &res);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 0)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not "is interchanged" "linterchange"} } */

View File

@@ -0,0 +1,53 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-4.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
unsigned u[1024];
static void __attribute__((noinline,noclone,noipa))
foo (int N, int M, unsigned *res)
{
int i, j;
unsigned sum = 0;
if (N > 0)
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
sum = u[i + 3 * j] - sum;
*res = sum;
}
extern void abort ();
int
main (void)
{
int i, j;
unsigned res;
u[0] = 1;
u[1] = 2;
u[2] = 4;
u[3] = 5;
u[4] = 7;
u[5] = 8;
foo (2, 3, &res);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 13)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not "is interchanged" "linterchange"} } */

View File

@@ -0,0 +1,52 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-4.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
double u[1782225];
static void __attribute__((noinline))
foo (int N, double *res)
{
int i, j;
double sum = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum = sum + u[i + 1335 * j];
*res = sum;
}
extern void abort ();
int
main (void)
{
int i, j;
double res;
for (i = 0; i < 1782225; i++)
u[i] = 0;
u[0] = __DBL_MAX__;
u[1335] = -__DBL_MAX__;
u[1] = __DBL_MAX__;
u[1336] = -__DBL_MAX__;
foo (1335, &res);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 0.0)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not "is interchanged" "linterchange"} } */

View File

@@ -0,0 +1,58 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-5.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
#define N 100
#define M 1111
int A[N][M];
static int __attribute__((noinline))
foo (void)
{
int i, j;
for( i = 0; i < M; i++)
for( j = 0; j < N; j++)
A[j][i] = 5 * A[j][i];
return A[0][0] + A[N-1][M-1];
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
int j;
for (j = 0; j < M; j++)
A[i][j] = 2;
}
int
main (void)
{
int i, j, res;
for (i = 0; i < N; i++)
init (i);
res = foo ();
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 20)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange"} } */

View File

@@ -0,0 +1,59 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-6.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
#define N 100
#define M 200
static int __attribute__((noinline))
foo (int A[N][M])
{
int i, j;
/* This loop should be interchanged. */
for(j = 0; j < M; j++)
for(i = 0; i < N; i++)
A[i][j] = A[i][j] + A[i][j];
return A[0][0] + A[N-1][M-1];
}
extern void abort ();
static void __attribute__((noinline))
init (int *arr, int i)
{
int j;
for (j = 0; j < M; j++)
arr[j] = 2;
}
int
main (void)
{
int A[N][M];
int i, j, res;
for (i = 0; i < N; i++)
init (A[i], i);
res = foo (A);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 8)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange"} } */

View File

@@ -0,0 +1,50 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
/* Copied from graphite/interchange-7.c */
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#endif
#define N 111
#define M 1111
static int __attribute__((noinline))
foo (double *a)
{
int i,j;
int r = 0;
for (i = 0; i < N; ++i)
for (j = 0; j < M; ++j)
r += a[j * N + i];
return r;
}
extern void abort ();
int
main (void)
{
double A[N*M];
int i, res;
for (i = 0; i < N*M; i++)
A[i] = 2;
res = foo (A);
#if DEBUG
fprintf (stderr, "res = %d \n", res);
#endif
if (res != 246642)
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange" { xfail *-*-* } } } */

View File

@@ -0,0 +1,71 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M], c[M][M], d[M][M];
void __attribute__((noinline))
matrix_mul_1 (int n)
{
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < n; k++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
}
void __attribute__((noinline))
matrix_mul_2 (int n)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
for (int k = 0; k < n; k++)
d[i][j] = d[i][j] + a[i][k]*b[k][j];
asm volatile ("" ::: "memory");
}
asm volatile ("" ::: "memory");
}
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
c[i][j] = 0;
d[i][j] = 0;
}
}
static int __attribute__((noinline))
check (int i)
{
for (int j = 0; j < M; j++)
if (c[i][j] != d[i][j])
return 0;
return 1;
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
matrix_mul_1 (M);
matrix_mul_2 (M);
for (int i = 0; i < M; ++i)
if (!check (i))
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange" } } */
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is not interchanged" 1 "linterchange" } } */

View File

@@ -0,0 +1,70 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M], c[M][M], d[M][M];
void __attribute__((noinline))
matrix_mul_1 (int n)
{
for (int j = 0; j < n; j++)
for (int k = 0; k < n; k++)
for (int i = 0; i < n; i++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
}
void __attribute__((noinline))
matrix_mul_2 (int n)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
for (int k = 0; k < n; k++)
d[i][j] = d[i][j] + a[i][k]*b[k][j];
asm volatile ("" ::: "memory");
}
asm volatile ("" ::: "memory");
}
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
c[i][j] = 0;
d[i][j] = 0;
}
}
static int __attribute__((noinline))
check (int i)
{
for (int j = 0; j < M; j++)
if (c[i][j] != d[i][j])
return 0;
return 1;
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
matrix_mul_1 (M);
matrix_mul_2 (M);
for (int i = 0; i < M; ++i)
if (!check (i))
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 2 "linterchange" } } */

View File

@@ -0,0 +1,70 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M], c[M][M], d[M][M];
void __attribute__((noinline))
matrix_mul_1 (int n)
{
for (int k = 0; k < n; k++)
for (int j = 0; j < n; j++)
for (int i = 0; i < n; i++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
}
void __attribute__((noinline))
matrix_mul_2 (int n)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
for (int k = 0; k < n; k++)
d[i][j] = d[i][j] + a[i][k]*b[k][j];
asm volatile ("" ::: "memory");
}
asm volatile ("" ::: "memory");
}
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
c[i][j] = 0;
d[i][j] = 0;
}
}
static int __attribute__((noinline))
check (int i)
{
for (int j = 0; j < M; j++)
if (c[i][j] != d[i][j])
return 0;
return 1;
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
matrix_mul_1 (M);
matrix_mul_2 (M);
for (int i = 0; i < M; ++i)
if (!check (i))
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 2 "linterchange" } } */

View File

@@ -0,0 +1,70 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M], c[M][M], d[M][M];
void __attribute__((noinline))
matrix_mul_1 (int n)
{
for (int i = 0; i < n; i++)
for (int k = 0; k < n; k++)
for (int j = 0; j < n; j++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
}
void __attribute__((noinline))
matrix_mul_2 (int n)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
for (int k = 0; k < n; k++)
d[i][j] = d[i][j] + a[i][k]*b[k][j];
asm volatile ("" ::: "memory");
}
asm volatile ("" ::: "memory");
}
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
c[i][j] = 0;
d[i][j] = 0;
}
}
static int __attribute__((noinline))
check (int i)
{
for (int j = 0; j < M; j++)
if (c[i][j] != d[i][j])
return 0;
return 1;
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
matrix_mul_1 (M);
matrix_mul_2 (M);
for (int i = 0; i < M; ++i)
if (!check (i))
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-not "Loop_pair<outer:., inner:.> is interchanged" "linterchange" } } */

View File

@@ -0,0 +1,62 @@
/* { dg-do run } */
/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
#define M 256
int a[M][M], b[M][M], c[M], d[M];
void __attribute__((noinline))
simple_reduc_1 (int n)
{
for (int j = 0; j < n; j++)
{
int sum = c[j];
for (int i = 0; i < n; i++)
sum = sum + a[i][j]*b[i][j];
c[j] = sum;
}
}
void __attribute__((noinline))
simple_reduc_2 (int n)
{
for (int j = 0; j < n; j++)
{
int sum = d[j];
for (int i = 0; i < n; i++)
sum = sum + a[i][j]*b[i][j];
asm volatile ("" ::: "memory");
d[j] = sum;
}
}
extern void abort ();
static void __attribute__((noinline))
init (int i)
{
c[i] = 0;
d[i] = 0;
for (int j = 0; j < M; j++)
{
a[i][j] = i;
b[i][j] = j;
}
}
int main (void)
{
for (int i = 0; i < M; ++i)
init (i);
simple_reduc_1 (M);
simple_reduc_2 (M);
for (int i = 0; i < M; ++i)
if (c[i] != d[i])
abort ();
return 0;
}
/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange" } } */

View File

@@ -0,0 +1,44 @@
! { dg-do compile }
! { dg-options "-O3 -ffast-math -floop-interchange -fdump-tree-linterchange-details" }
subroutine mat_times_vec(y,x,a,axp,ayp,azp,axm,aym,azm,
$ nb,nx,ny,nz)
implicit none
integer nb,nx,ny,nz,i,j,k,m,l,kit,im1,ip1,jm1,jp1,km1,kp1
real*8 y(nb,nx,ny,nz),x(nb,nx,ny,nz)
real*8 a(nb,nb,nx,ny,nz),
1 axp(nb,nb,nx,ny,nz),ayp(nb,nb,nx,ny,nz),azp(nb,nb,nx,ny,nz),
2 axm(nb,nb,nx,ny,nz),aym(nb,nb,nx,ny,nz),azm(nb,nb,nx,ny,nz)
do k=1,nz
km1=mod(k+nz-2,nz)+1
kp1=mod(k,nz)+1
do j=1,ny
jm1=mod(j+ny-2,ny)+1
jp1=mod(j,ny)+1
do i=1,nx
im1=mod(i+nx-2,nx)+1
ip1=mod(i,nx)+1
do l=1,nb
y(l,i,j,k)=0.0d0
do m=1,nb
y(l,i,j,k)=y(l,i,j,k)+
1 a(l,m,i,j,k)*x(m,i,j,k)+
2 axp(l,m,i,j,k)*x(m,ip1,j,k)+
3 ayp(l,m,i,j,k)*x(m,i,jp1,k)+
4 azp(l,m,i,j,k)*x(m,i,j,kp1)+
5 axm(l,m,i,j,k)*x(m,im1,j,k)+
6 aym(l,m,i,j,k)*x(m,i,jm1,k)+
7 azm(l,m,i,j,k)*x(m,i,j,km1)
enddo
enddo
enddo
enddo
enddo
return
end
! { dg-final { scan-tree-dump-times "is interchanged" 1 "linterchange" } }

View File

@@ -184,6 +184,7 @@ DEFTIMEVAR (TV_TREE_LOOP , "tree loop optimization")
DEFTIMEVAR (TV_TREE_NOLOOP , "loopless fn")
DEFTIMEVAR (TV_TREE_LOOP_BOUNDS , "tree loop bounds")
DEFTIMEVAR (TV_LIM , "tree loop invariant motion")
DEFTIMEVAR (TV_LINTERCHANGE , "tree loop interchange")
DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")

View File

@@ -368,6 +368,7 @@ extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);

View File

@@ -1723,3 +1723,56 @@ make_pass_cd_dce (gcc::context *ctxt)
{
return new pass_cd_dce (ctxt);
}
/* A cheap DCE interface starting from a seed set of possibly dead stmts. */
void
simple_dce_from_worklist (bitmap seeds)
{
/* ??? Re-use seeds as worklist not only as initial set. This may end up
removing more code as well. If we keep seeds unchanged we could restrict
new worklist elements to members of seed. */
bitmap worklist = seeds;
while (! bitmap_empty_p (worklist))
{
/* Pop item. */
unsigned i = bitmap_first_set_bit (worklist);
bitmap_clear_bit (worklist, i);
tree def = ssa_name (i);
/* Removed by somebody else or still in use. */
if (! def || ! has_zero_uses (def))
continue;
gimple *t = SSA_NAME_DEF_STMT (def);
if (gimple_has_side_effects (t))
continue;
/* Add uses to the worklist. */
ssa_op_iter iter;
use_operand_p use_p;
FOR_EACH_PHI_OR_STMT_USE (use_p, t, iter, SSA_OP_USE)
{
tree use = USE_FROM_PTR (use_p);
if (TREE_CODE (use) == SSA_NAME
&& ! SSA_NAME_IS_DEFAULT_DEF (use))
bitmap_set_bit (worklist, SSA_NAME_VERSION (use));
}
/* Remove stmt. */
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "Removing dead stmt:");
print_gimple_stmt (dump_file, t, 0);
}
gimple_stmt_iterator gsi = gsi_for_stmt (t);
if (gimple_code (t) == GIMPLE_PHI)
remove_phi_node (&gsi, true);
else
{
gsi_remove (&gsi, true);
release_defs (t);
}
}
}

22
gcc/tree-ssa-dce.h Normal file
View File

@@ -0,0 +1,22 @@
/* Copyright (C) 2017 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any
later version.
GCC is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
#ifndef TREE_SSA_DCE_H
#define TREE_SSA_DCE_H
extern void simple_dce_from_worklist (bitmap);
#endif

View File

@@ -76,10 +76,13 @@ enum unroll_level
};
/* Adds a canonical induction variable to LOOP iterating NITER times. EXIT
is the exit edge whose condition is replaced. */
is the exit edge whose condition is replaced. The ssa versions of the new
IV before and after increment will be stored in VAR_BEFORE and VAR_AFTER
if they are not NULL. */
static void
create_canonical_iv (struct loop *loop, edge exit, tree niter)
void
create_canonical_iv (struct loop *loop, edge exit, tree niter,
tree *var_before = NULL, tree *var_after = NULL)
{
edge in;
tree type, var;
@@ -112,7 +115,9 @@ create_canonical_iv (struct loop *loop, edge exit, tree niter)
create_iv (niter,
build_int_cst (type, -1),
NULL_TREE, loop,
&incr_at, false, NULL, &var);
&incr_at, false, var_before, &var);
if (var_after)
*var_after = var;
cmp = (exit->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
gimple_cond_set_code (cond, cmp);

View File

@@ -32,4 +32,6 @@ extern tree strip_offset (tree, unsigned HOST_WIDE_INT *);
bool may_be_nonaddressable_p (tree expr);
void tree_ssa_iv_optimize (void);
void create_canonical_iv (struct loop *, edge, tree,
tree * = NULL, tree * = NULL);
#endif /* GCC_TREE_SSA_LOOP_IVOPTS_H */

View File

@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. If not see
#include "dbgcnt.h"
#include "domwalk.h"
#include "tree-ssa-propagate.h"
#include "tree-ssa-dce.h"
#include "tree-cfgcleanup.h"
#include "alias.h"
@@ -3995,64 +3996,6 @@ compute_avail (void)
free (worklist);
}
/* Cheap DCE of a known set of possibly dead stmts.
Because we don't follow exactly the standard PRE algorithm, and decide not
to insert PHI nodes sometimes, and because value numbering of casts isn't
perfect, we sometimes end up inserting dead code. This simple DCE-like
pass removes any insertions we made that weren't actually used. */
static void
remove_dead_inserted_code (void)
{
/* ??? Re-use inserted_exprs as worklist not only as initial set.
This may end up removing non-inserted code as well. If we
keep inserted_exprs unchanged we could restrict new worklist
elements to members of inserted_exprs. */
bitmap worklist = inserted_exprs;
while (! bitmap_empty_p (worklist))
{
/* Pop item. */
unsigned i = bitmap_first_set_bit (worklist);
bitmap_clear_bit (worklist, i);
tree def = ssa_name (i);
/* Removed by somebody else or still in use. */
if (! def || ! has_zero_uses (def))
continue;
gimple *t = SSA_NAME_DEF_STMT (def);
if (gimple_has_side_effects (t))
continue;
/* Add uses to the worklist. */
ssa_op_iter iter;
use_operand_p use_p;
FOR_EACH_PHI_OR_STMT_USE (use_p, t, iter, SSA_OP_USE)
{
tree use = USE_FROM_PTR (use_p);
if (TREE_CODE (use) == SSA_NAME
&& ! SSA_NAME_IS_DEFAULT_DEF (use))
bitmap_set_bit (worklist, SSA_NAME_VERSION (use));
}
/* Remove stmt. */
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "Removing unnecessary insertion:");
print_gimple_stmt (dump_file, t, 0);
}
gimple_stmt_iterator gsi = gsi_for_stmt (t);
if (gimple_code (t) == GIMPLE_PHI)
remove_phi_node (&gsi, true);
else
{
gsi_remove (&gsi, true);
release_defs (t);
}
}
}
/* Initialize data structures used by PRE. */
@@ -4188,7 +4131,11 @@ pass_pre::execute (function *fun)
/* Remove all the redundant expressions. */
todo |= vn_eliminate (inserted_exprs);
remove_dead_inserted_code ();
/* Because we don't follow exactly the standard PRE algorithm, and decide not
to insert PHI nodes sometimes, and because value numbering of casts isn't
perfect, we sometimes end up inserting dead code. This simple DCE-like
pass removes any insertions we made that weren't actually used. */
simple_dce_from_worklist (inserted_exprs);
fini_pre ();

View File

@@ -2626,6 +2626,114 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple *phi,
}
/* Return true if the reduction PHI in LOOP with latch arg LOOP_ARG and
reduction operation CODE has a handled computation expression. */
bool
check_reduction_path (location_t loc, loop_p loop, gphi *phi, tree loop_arg,
enum tree_code code)
{
auto_vec<std::pair<ssa_op_iter, use_operand_p> > path;
auto_bitmap visited;
tree lookfor = PHI_RESULT (phi);
ssa_op_iter curri;
use_operand_p curr = op_iter_init_phiuse (&curri, phi, SSA_OP_USE);
while (USE_FROM_PTR (curr) != loop_arg)
curr = op_iter_next_use (&curri);
curri.i = curri.numops;
do
{
path.safe_push (std::make_pair (curri, curr));
tree use = USE_FROM_PTR (curr);
if (use == lookfor)
break;
gimple *def = SSA_NAME_DEF_STMT (use);
if (gimple_nop_p (def)
|| ! flow_bb_inside_loop_p (loop, gimple_bb (def)))
{
pop:
do
{
std::pair<ssa_op_iter, use_operand_p> x = path.pop ();
curri = x.first;
curr = x.second;
do
curr = op_iter_next_use (&curri);
/* Skip already visited or non-SSA operands (from iterating
over PHI args). */
while (curr != NULL_USE_OPERAND_P
&& (TREE_CODE (USE_FROM_PTR (curr)) != SSA_NAME
|| ! bitmap_set_bit (visited,
SSA_NAME_VERSION
(USE_FROM_PTR (curr)))));
}
while (curr == NULL_USE_OPERAND_P && ! path.is_empty ());
if (curr == NULL_USE_OPERAND_P)
break;
}
else
{
if (gimple_code (def) == GIMPLE_PHI)
curr = op_iter_init_phiuse (&curri, as_a <gphi *>(def), SSA_OP_USE);
else
curr = op_iter_init_use (&curri, def, SSA_OP_USE);
while (curr != NULL_USE_OPERAND_P
&& (TREE_CODE (USE_FROM_PTR (curr)) != SSA_NAME
|| ! bitmap_set_bit (visited,
SSA_NAME_VERSION
(USE_FROM_PTR (curr)))))
curr = op_iter_next_use (&curri);
if (curr == NULL_USE_OPERAND_P)
goto pop;
}
}
while (1);
if (dump_file && (dump_flags & TDF_DETAILS))
{
dump_printf_loc (MSG_NOTE, loc, "reduction path: ");
unsigned i;
std::pair<ssa_op_iter, use_operand_p> *x;
FOR_EACH_VEC_ELT (path, i, x)
{
dump_generic_expr (MSG_NOTE, TDF_SLIM, USE_FROM_PTR (x->second));
dump_printf (MSG_NOTE, " ");
}
dump_printf (MSG_NOTE, "\n");
}
/* Check whether the reduction path detected is valid. */
bool fail = path.length () == 0;
bool neg = false;
for (unsigned i = 1; i < path.length (); ++i)
{
gimple *use_stmt = USE_STMT (path[i].second);
tree op = USE_FROM_PTR (path[i].second);
if (! has_single_use (op)
|| ! is_gimple_assign (use_stmt))
{
fail = true;
break;
}
if (gimple_assign_rhs_code (use_stmt) != code)
{
if (code == PLUS_EXPR
&& gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
{
/* Track whether we negate the reduction value each iteration. */
if (gimple_assign_rhs2 (use_stmt) == op)
neg = ! neg;
}
else
{
fail = true;
break;
}
}
}
return ! fail && ! neg;
}
/* Function vect_is_simple_reduction
(1) Detect a cross-iteration def-use cycle that represents a simple
@@ -3128,106 +3236,8 @@ vect_is_simple_reduction (loop_vec_info loop_info, gimple *phi,
}
/* Look for the expression computing loop_arg from loop PHI result. */
auto_vec<std::pair<ssa_op_iter, use_operand_p> > path;
auto_bitmap visited;
tree lookfor = PHI_RESULT (phi);
ssa_op_iter curri;
use_operand_p curr = op_iter_init_phiuse (&curri, as_a <gphi *>(phi),
SSA_OP_USE);
while (USE_FROM_PTR (curr) != loop_arg)
curr = op_iter_next_use (&curri);
curri.i = curri.numops;
do
{
path.safe_push (std::make_pair (curri, curr));
tree use = USE_FROM_PTR (curr);
if (use == lookfor)
break;
gimple *def = SSA_NAME_DEF_STMT (use);
if (gimple_nop_p (def)
|| ! flow_bb_inside_loop_p (loop, gimple_bb (def)))
{
pop:
do
{
std::pair<ssa_op_iter, use_operand_p> x = path.pop ();
curri = x.first;
curr = x.second;
do
curr = op_iter_next_use (&curri);
/* Skip already visited or non-SSA operands (from iterating
over PHI args). */
while (curr != NULL_USE_OPERAND_P
&& (TREE_CODE (USE_FROM_PTR (curr)) != SSA_NAME
|| ! bitmap_set_bit (visited,
SSA_NAME_VERSION
(USE_FROM_PTR (curr)))));
}
while (curr == NULL_USE_OPERAND_P && ! path.is_empty ());
if (curr == NULL_USE_OPERAND_P)
break;
}
else
{
if (gimple_code (def) == GIMPLE_PHI)
curr = op_iter_init_phiuse (&curri, as_a <gphi *>(def), SSA_OP_USE);
else
curr = op_iter_init_use (&curri, def, SSA_OP_USE);
while (curr != NULL_USE_OPERAND_P
&& (TREE_CODE (USE_FROM_PTR (curr)) != SSA_NAME
|| ! bitmap_set_bit (visited,
SSA_NAME_VERSION
(USE_FROM_PTR (curr)))))
curr = op_iter_next_use (&curri);
if (curr == NULL_USE_OPERAND_P)
goto pop;
}
}
while (1);
if (dump_file && (dump_flags & TDF_DETAILS))
{
dump_printf_loc (MSG_NOTE, vect_location,
"reduction path: ");
unsigned i;
std::pair<ssa_op_iter, use_operand_p> *x;
FOR_EACH_VEC_ELT (path, i, x)
{
dump_generic_expr (MSG_NOTE, TDF_SLIM, USE_FROM_PTR (x->second));
dump_printf (MSG_NOTE, " ");
}
dump_printf (MSG_NOTE, "\n");
}
/* Check whether the reduction path detected is valid. */
bool fail = path.length () == 0;
bool neg = false;
for (unsigned i = 1; i < path.length (); ++i)
{
gimple *use_stmt = USE_STMT (path[i].second);
tree op = USE_FROM_PTR (path[i].second);
if (! has_single_use (op)
|| ! is_gimple_assign (use_stmt))
{
fail = true;
break;
}
if (gimple_assign_rhs_code (use_stmt) != code)
{
if (code == PLUS_EXPR
&& gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
{
/* Track whether we negate the reduction value each iteration. */
if (gimple_assign_rhs2 (use_stmt) == op)
neg = ! neg;
}
else
{
fail = true;
break;
}
}
}
if (! fail && ! neg)
if (check_reduction_path (vect_location, loop, as_a <gphi *> (phi), loop_arg,
code))
return def_stmt;
if (dump_enabled_p ())

View File

@@ -1255,6 +1255,9 @@ extern tree vect_create_addr_base_for_vector_ref (gimple *, gimple_seq *,
/* FORNOW: Used in tree-parloops.c. */
extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *,
bool *, bool);
/* Used in gimple-loop-interchange.c. */
extern bool check_reduction_path (location_t, loop_p, gphi *, tree,
enum tree_code);
/* Drive for loop analysis stage. */
extern loop_vec_info vect_analyze_loop (struct loop *, loop_vec_info);
extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL);