rb6.F Program
Description
File Name:rb6.F
Description:
Like rb4.F and
rb5.F rb6.F performs one sweep
through the grid updating each node m times.
However, rb6.F tries to update each node as soon as
possible. Hence, if a red node in line i is updated
the first time the black node in line i-1
directly underneath is also updated the first time.
As a consequence the red node in line i-2
and the black node in line i-3 directly
underneath can be updated the second time and so on
till the red node in line i-m*2+2 and the
black node in line i-m*2+1 are updated the
mth time.
Comment:
The data locality behaviour of the algorithm
should be a little bit better than that of rb5.F because the data in the lines
i-2,...,i-2*m+1 is reused earlier. Hence, there
should be a higher chance that the data is still in
the cache. Also, because of the earlier reuse of the
data less data must be hold in the cache
simultaniosly.
Results:
Memory access behaviour
Size |
MBytes
/sec |
% of all access which go
into |
± |
1. Level |
2. Level |
3. Level |
Memory |
2 sweeps performed together |
16 |
1275.5 |
20.6 |
79.4 |
0.1 |
0.0 |
0.0 |
32 |
1251.8 |
20.8 |
77.8 |
1.3 |
0.0 |
0.0 |
64 |
1049.0 |
21.0 |
68.9 |
9.8 |
0.3 |
0.0 |
128 |
1021.0 |
21.1 |
61.7 |
14.4 |
2.7 |
0.0 |
256 |
910.8 |
21.2 |
30.8 |
45.1 |
2.7 |
0.1 |
512 |
613.7 |
21.1 |
30.6 |
45.0 |
1.5 |
1.8 |
1024 |
509.5 |
21.2 |
36.7 |
36.2 |
4.1 |
1.8 |
2048 |
380.4 |
20.8 |
26.2 |
38.6 |
12.5 |
1.8 |
3 sweeps performed together |
16 |
1431.7 |
21.5 |
78.4 |
0.0 |
0.0 |
0.0 |
32 |
1374.6 |
22.7 |
60.1 |
17.2 |
0.0 |
0.0 |
64 |
1166.3 |
22.8 |
60.7 |
16.3 |
0.2 |
0.0 |
128 |
1131.5 |
23.5 |
45.4 |
29.1 |
1.9 |
0.0 |
256 |
1060.1 |
23.7 |
30.1 |
44.3 |
1.8 |
0.0 |
512 |
759.2 |
23.7 |
29.6 |
44.5 |
1.0 |
1.2 |
1024 |
516.5 |
23.5 |
35.9 |
32.7 |
6.7 |
1.2 |
2048 |
393.1 |
23.3 |
27.5 |
34.1 |
13.8 |
1.2 |
Runtime behaviour
Size |
MFlops
/sec |
% of cycles used for |
± |
Base |
Exec |
Cache |
DTB |
Branch |
R dep |
Nops |
2 sweeps performed
together |
16 |
286.7 |
7.0 |
102.3 |
49.2 |
0.2 |
0.3 |
0.9 |
42.4 |
2.3 |
32 |
282.3 |
10.8 |
104.8 |
47.7 |
3.9 |
0.2 |
0.5 |
39.6 |
2.1 |
64 |
237.0 |
4.7 |
102.2 |
42.1 |
8.8 |
0.1 |
0.0 |
43.5 |
3.0 |
128 |
231.2 |
8.9 |
105.2 |
41.7 |
17.7 |
0.2 |
0.0 |
33.8 |
2.9 |
256 |
206.5 |
6.7 |
101.0 |
37.4 |
16.9 |
5.2 |
0.0 |
32.7 |
2.1 |
512 |
139.0 |
4.7 |
100.7 |
25.2 |
44.1 |
3.6 |
0.0 |
21.7 |
1.4 |
1024 |
115.5 |
4.8 |
100.3 |
21.0 |
45.8 |
4.7 |
0.0 |
23.4 |
0.6 |
2048 |
85.8 |
2.9 |
100.4 |
16.5 |
58.8 |
4.6 |
0.0 |
16.7 |
0.9 |
3 sweeps performed
together |
16 |
325.7 |
5.7 |
106.1 |
50.6 |
0.3 |
1.0 |
0.9 |
43.7 |
3.9 |
32 |
317.4 |
5.8 |
105.1 |
49.7 |
4.7 |
0.4 |
0.3 |
40.7 |
3.5 |
64 |
269.9 |
4.6 |
103.4 |
45.0 |
8.2 |
0.1 |
0.0 |
41.0 |
4.5 |
128 |
264.3 |
4.4 |
104.5 |
44.9 |
14.6 |
0.3 |
0.0 |
36.3 |
4.0 |
256 |
248.1 |
6.9 |
104.4 |
42.3 |
13.1 |
4.3 |
0.0 |
34.0 |
3.8 |
512 |
177.6 |
4.8 |
102.2 |
30.0 |
37.3 |
3.2 |
0.0 |
24.2 |
2.7 |
1024 |
120.6 |
5.7 |
100.8 |
21.1 |
46.0 |
4.8 |
0.0 |
22.0 |
1.2 |
2048 |
91.5 |
4.2 |
100.2 |
17.0 |
57.2 |
4.2 |
0.0 |
17.1 |
0.5 |
Table
explanation
|