rb5.F Program
Description
File Name:rb5.F
Description:
The algorithm is a slightly alternated
form of the algorithm rb4.F.
We move through the grid updating all the nodes
m times instead of once, preserving the data
dependencies given by the standard red-black
Gauss-Seidel algorithm. However, instead of doing a
complete sweep through one line i and then
sequentially updating the lines i-1,...,
i-m*2+1 underneath it like in rb4.F, we move simultaneously through
line i and line i-1 update a red point in line i and
the black point in line i-1 directly underneat it.
Then we update the lines i-2 and i-3 and so on, till
we reach the lines i-m*2+2 and
i-m*2+1.
Comment:
rb5.F combines the properties of rb2.F and rb4.F. So, locality is improved by
melting red and black sweeps together (improving both
cache and register usage) and by melting successive
red-black Gauss-Seidel sweeps together.
Results:
Memory access behaviour
Size |
MBytes
/sec |
% of all access which go
into |
± |
1. Level |
2. Level |
3. Level |
Memory |
2 sweeps performed together |
16 |
1766.6 |
19.4 |
80.6 |
0.0 |
0.0 |
0.0 |
32 |
1188.6 |
56.6 |
32.0 |
11.4 |
0.0 |
0.0 |
64 |
2806.3 |
17.8 |
75.2 |
6.8 |
0.2 |
0.0 |
128 |
1780.3 |
21.3 |
57.5 |
18.6 |
2.5 |
0.0 |
256 |
1603.8 |
21.5 |
37.1 |
38.7 |
2.7 |
0.0 |
512 |
795.7 |
21.1 |
35.1 |
40.2 |
1.8 |
1.8 |
1024 |
656.4 |
21.1 |
29.1 |
43.6 |
4.4 |
1.8 |
2048 |
397.1 |
20.8 |
27.5 |
35.4 |
14.3 |
1.9 |
3 sweeps performed together |
16 |
1807.1 |
18.8 |
81.2 |
0.0 |
0.0 |
0.0 |
32 |
1987.4 |
20.1 |
78.1 |
1.7 |
0.1 |
0.0 |
64 |
2594.2 |
17.4 |
71.4 |
11.0 |
0.2 |
0.0 |
128 |
1788.3 |
21.0 |
50.9 |
25.8 |
2.3 |
0.0 |
256 |
1625.6 |
21.2 |
36.7 |
40.2 |
1.9 |
0.0 |
512 |
1003.0 |
21.0 |
36.4 |
39.7 |
1.6 |
1.2 |
1024 |
667.4 |
21.0 |
28.4 |
42.4 |
7.0 |
1.2 |
2048 |
409.6 |
20.9 |
27.2 |
35.6 |
15.1 |
1.2 |
Runtime behaviour
Size |
MFlops
/sec |
% of cycles used for |
± |
Base |
Exec |
Cache |
DTB |
Branch |
R dep |
Nops |
2 sweeps performed
together |
16 |
391.3 |
1.7 |
121.4 |
70.2 |
0.3 |
4.3 |
7.1 |
26.9 |
10.9 |
32 |
489.2 |
1.6 |
126.2 |
81.9 |
6.5 |
3.5 |
4.4 |
13.8 |
14.5 |
64 |
609.9 |
0.3 |
144.2 |
104.4 |
4.4 |
6.1 |
6.0 |
7.6 |
15.4 |
128 |
404.1 |
-2.5 |
124.2 |
73.1 |
31.1 |
7.3 |
2.8 |
0.0 |
12.4 |
256 |
364.6 |
-2.4 |
117.2 |
61.8 |
37.6 |
7.3 |
2.4 |
0.0 |
10.5 |
512 |
180.1 |
-1.1 |
106.8 |
30.5 |
62.6 |
8.4 |
1.2 |
0.0 |
5.2 |
1024 |
148.6 |
-0.7 |
105.1 |
25.0 |
72.0 |
3.6 |
1.0 |
0.0 |
4.2 |
2048 |
89.6 |
-0.1 |
102.6 |
15.5 |
82.6 |
1.5 |
0.6 |
0.0 |
2.5 |
3 sweeps performed
together |
16 |
397.4 |
2.4 |
119.1 |
67.1 |
0.2 |
3.9 |
5.9 |
29.9 |
9.7 |
32 |
444.3 |
1.5 |
119.5 |
73.0 |
12.8 |
3.6 |
3.5 |
12.1 |
13.0 |
64 |
561.0 |
-0.4 |
133.5 |
95.4 |
7.8 |
4.5 |
5.0 |
7.1 |
14.1 |
128 |
404.3 |
-1.9 |
123.1 |
71.1 |
29.9 |
7.1 |
2.8 |
2.4 |
11.7 |
256 |
368.2 |
-1.9 |
119.4 |
62.6 |
38.7 |
7.0 |
2.4 |
0.0 |
10.6 |
512 |
226.9 |
-1.2 |
107.7 |
36.8 |
58.4 |
6.1 |
1.4 |
0.0 |
6.2 |
1024 |
150.9 |
-0.7 |
105.7 |
25.0 |
73.1 |
3.1 |
1.0 |
0.0 |
4.2 |
2048 |
92.5 |
-0.2 |
102.8 |
16.5 |
81.7 |
1.5 |
0.6 |
0.0 |
2.7 |
Table
explanation
|