banner image

Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations

logo
home
staff
coorperations
publications
talks
tutorials
software
results
contact

A cooperation
between
lss logo
and
lrr logo.

Funded by
dfg logo.

rb3.F Program Description

File Name:

rb3.F

Description:

Like rb2.F, this algorithmn makes use of the fact that the black nodes in row i-1 can be updated once the red nodes in row i are up to date. Consequently, we work in pairs of rows; once all of the red nodes in one row have been updated, all the black nodes in the previous row are updated. Care must be taken in the first row and the last row of the grid.

Comment:

Like rb2.F, the algorithmn is able to reduce one of the grid transfers from main memory to the cache. However, the algorithm is not able to keep the values of a red/black node pair long enougth in registers to reuse it during the update of the black nodes.

Results:

Memory access behaviour
Size MBytes
/sec
% of all access which go into
± 1. Level 2. Level 3. Level Memory
16 1825.0 3.3 96.7 0.0 0.0 0.0
32 2161.0 4.1 75.3 20.5 0.0 0.0
64 2888.9 0.6 86.2 12.8 0.4 0.0
128 1736.2 1.4 83.8 7.9 6.8 0.0
256 1294.8 7.1 48.1 38.3 6.5 0.0
512 565.4 6.8 29.5 57.1 2.9 3.6
1024 517.3 6.6 25.0 59.8 5.1 3.6
2048 444.5 4.5 30.7 47.8 13.3 3.6

Runtime behaviour
Size MFlops
/sec
% of cycles used for
± Base Exec Cache DTB Branch R dep Nops
16 337.0 5.2 118.9 68.5 0.3 3.9 8.7 27.2 5.1
32 402.4 3.3 122.9 76.6 10.5 5.0 6.4 14.0 7.1
64 519.2 -0.3 151.2 127.0 1.3 16.3 0.0 0.0 6.9
128 314.6 -0.7 114.4 67.7 7.3 36.4 0.0 0.0 3.7
256 248.9 0.0 111.0 46.3 51.7 7.8 0.1 0.0 5.1
512 108.3 0.0 103.9 19.9 78.7 3.3 0.1 0.0 1.9
1024 98.9 0.0 103.0 18.2 80.1 2.9 0.0 0.0 1.8
2048 83.2 0.0 102.5 16.0 81.6 2.5 0.0 0.0 2.4
Table explanation

cs10-dime@fau.de
Last Modified: 10 January 2008
Valid HTML 4.01! Powered by vim