rb3.F Program
Description
File Name:rb3.F
Description:
Like rb2.F, this algorithmn
makes use of the fact that the black nodes in row
i-1 can be updated once the red nodes in row
i are up to date. Consequently, we work in
pairs of rows; once all of the red nodes in one row
have been updated, all the black nodes in the
previous row are updated. Care must be taken in the
first row and the last row of the grid.
Comment:
Like rb2.F, the algorithmn
is able to reduce one of the grid transfers from main
memory to the cache. However, the algorithm is not
able to keep the values of a red/black node pair long
enougth in registers to reuse it during the update of
the black nodes.
Results:
Memory access behaviour
Size |
MBytes
/sec |
% of all access which go
into |
± |
1. Level |
2. Level |
3. Level |
Memory |
16 |
1825.0 |
3.3 |
96.7 |
0.0 |
0.0 |
0.0 |
32 |
2161.0 |
4.1 |
75.3 |
20.5 |
0.0 |
0.0 |
64 |
2888.9 |
0.6 |
86.2 |
12.8 |
0.4 |
0.0 |
128 |
1736.2 |
1.4 |
83.8 |
7.9 |
6.8 |
0.0 |
256 |
1294.8 |
7.1 |
48.1 |
38.3 |
6.5 |
0.0 |
512 |
565.4 |
6.8 |
29.5 |
57.1 |
2.9 |
3.6 |
1024 |
517.3 |
6.6 |
25.0 |
59.8 |
5.1 |
3.6 |
2048 |
444.5 |
4.5 |
30.7 |
47.8 |
13.3 |
3.6 |
Runtime behaviour
Size |
MFlops
/sec |
% of cycles used for |
± |
Base |
Exec |
Cache |
DTB |
Branch |
R dep |
Nops |
16 |
337.0 |
5.2 |
118.9 |
68.5 |
0.3 |
3.9 |
8.7 |
27.2 |
5.1 |
32 |
402.4 |
3.3 |
122.9 |
76.6 |
10.5 |
5.0 |
6.4 |
14.0 |
7.1 |
64 |
519.2 |
-0.3 |
151.2 |
127.0 |
1.3 |
16.3 |
0.0 |
0.0 |
6.9 |
128 |
314.6 |
-0.7 |
114.4 |
67.7 |
7.3 |
36.4 |
0.0 |
0.0 |
3.7 |
256 |
248.9 |
0.0 |
111.0 |
46.3 |
51.7 |
7.8 |
0.1 |
0.0 |
5.1 |
512 |
108.3 |
0.0 |
103.9 |
19.9 |
78.7 |
3.3 |
0.1 |
0.0 |
1.9 |
1024 |
98.9 |
0.0 |
103.0 |
18.2 |
80.1 |
2.9 |
0.0 |
0.0 |
1.8 |
2048 |
83.2 |
0.0 |
102.5 |
16.0 |
81.6 |
2.5 |
0.0 |
0.0 |
2.4 |
Table
explanation
|