banner image

Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations

logo
home
staff
coorperations
publications
talks
tutorials
software
results
contact

A cooperation
between
lss logo
and
lrr logo.

Funded by
dfg logo.

rb5.F Program Description

File Name:

rb5.F

Description:

sweeps The algorithm is a slightly alternated form of the algorithm rb4.F. We move through the grid updating all the nodes m times instead of once, preserving the data dependencies given by the standard red-black Gauss-Seidel algorithm. However, instead of doing a complete sweep through one line i and then sequentially updating the lines i-1,..., i-m*2+1 underneath it like in rb4.F, we move simultaneously through line i and line i-1 update a red point in line i and the black point in line i-1 directly underneat it. Then we update the lines i-2 and i-3 and so on, till we reach the lines i-m*2+2 and i-m*2+1.

Comment:

rb5.F combines the properties of rb2.F and rb4.F. So, locality is improved by melting red and black sweeps together (improving both cache and register usage) and by melting successive red-black Gauss-Seidel sweeps together.

Results:

Memory access behaviour
Size MBytes
/sec
% of all access which go into
± 1. Level 2. Level 3. Level Memory
2 sweeps performed together
16 1766.6 19.4 80.6 0.0 0.0 0.0
32 1188.6 56.6 32.0 11.4 0.0 0.0
64 2806.3 17.8 75.2 6.8 0.2 0.0
128 1780.3 21.3 57.5 18.6 2.5 0.0
256 1603.8 21.5 37.1 38.7 2.7 0.0
512 795.7 21.1 35.1 40.2 1.8 1.8
1024 656.4 21.1 29.1 43.6 4.4 1.8
2048 397.1 20.8 27.5 35.4 14.3 1.9
3 sweeps performed together
16 1807.1 18.8 81.2 0.0 0.0 0.0
32 1987.4 20.1 78.1 1.7 0.1 0.0
64 2594.2 17.4 71.4 11.0 0.2 0.0
128 1788.3 21.0 50.9 25.8 2.3 0.0
256 1625.6 21.2 36.7 40.2 1.9 0.0
512 1003.0 21.0 36.4 39.7 1.6 1.2
1024 667.4 21.0 28.4 42.4 7.0 1.2
2048 409.6 20.9 27.2 35.6 15.1 1.2

Runtime behaviour
Size MFlops
/sec
% of cycles used for
± Base Exec Cache DTB Branch R dep Nops
2 sweeps performed together
16 391.3 1.7 121.4 70.2 0.3 4.3 7.1 26.9 10.9
32 489.2 1.6 126.2 81.9 6.5 3.5 4.4 13.8 14.5
64 609.9 0.3 144.2 104.4 4.4 6.1 6.0 7.6 15.4
128 404.1 -2.5 124.2 73.1 31.1 7.3 2.8 0.0 12.4
256 364.6 -2.4 117.2 61.8 37.6 7.3 2.4 0.0 10.5
512 180.1 -1.1 106.8 30.5 62.6 8.4 1.2 0.0 5.2
1024 148.6 -0.7 105.1 25.0 72.0 3.6 1.0 0.0 4.2
2048 89.6 -0.1 102.6 15.5 82.6 1.5 0.6 0.0 2.5
3 sweeps performed together
16 397.4 2.4 119.1 67.1 0.2 3.9 5.9 29.9 9.7
32 444.3 1.5 119.5 73.0 12.8 3.6 3.5 12.1 13.0
64 561.0 -0.4 133.5 95.4 7.8 4.5 5.0 7.1 14.1
128 404.3 -1.9 123.1 71.1 29.9 7.1 2.8 2.4 11.7
256 368.2 -1.9 119.4 62.6 38.7 7.0 2.4 0.0 10.6
512 226.9 -1.2 107.7 36.8 58.4 6.1 1.4 0.0 6.2
1024 150.9 -0.7 105.7 25.0 73.1 3.1 1.0 0.0 4.2
2048 92.5 -0.2 102.8 16.5 81.7 1.5 0.6 0.0 2.7
Table explanation

cs10-dime@fau.de
Last Modified: 10 January 2008
Valid HTML 4.01! Powered by vim