banner image

Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations

logo
home
staff
coorperations
publications
talks
tutorials
software
results
contact

A cooperation
between
lss logo
and
lrr logo.

Funded by
dfg logo.

rb4.F Program Description

File Name:

rb4.F

Description:

The algorithms rb2.F and rb3.F both try to reuse the data of the red update sweep in the black sweep soon enough so that the data is still in the cache. They do this by melting the operations for the red and black update sweeps together. However, if several successive red-black Gauss-Seidel relaxations are performed the data in the cache is not reused between different relaxation sweeps if the grid is to big to fit in the cache. The present algorithm addresses this issue.

Beginning of Sweep Assume that we melt two red-black Gauss-Seidel sweeps together. We start in a situation where we have updated the first rows in pairs (see figure above; circles indicate number of performed relaxations). In this situation we are able to update the red points in row 1 the second time. We now continue to update the red nodes in row 4. Once all of the red nodes in row 4 have been updated all of the black nodes in row 3 can be updated. Then all red nodes in row 2 are updated the second time and as a consequence also the black nodes in row 1 are updated the second time.

Staircase situation We now are in a wavefront like situation where we have updated some of the lines once and some twice. Then we update the (red) nodes in line 5 and cascade down three rows updating alternately black and red nodes. In this way, we move the wavefront throught the grid until we reach the last row of the grid (step 9). The height of the wavefront is the number of simultaneously performed update sweeps. The width of the wavefornt is twice the number of update sweeps. In the shown case of two simultaniously performed update sweeps this means that the wavefront is 4 rows wide. At the top of the grid (step 10 to 15) the wavefront shrinks down to one single row. So that at the end of the relaxation sweep only the black nodes in the last row remains for update.

The technique can be generalized so that the nodes can be updated m times in one sweep throught the grid obtaining the same results we would get if m sweps of the standard red-black Gauss-Seidel algorithm were applied.

Comment:

Instead of doing m successive sweeps through the grid updating each node once we do just one sweep throught the grid updating each node m times. Hence, the grid is transfered from main memory to the cache once instead of m times. However, this is only correct if m*2+2 rows of the grid fit in the cache.

Results:

Memory access behaviour
Size MBytes
/sec
% of all access which go into
± 1. Level 2. Level 3. Level Memory
2 sweeps performed together
16 1769.8 4.8 95.1 0.0 0.0 0.0
32 2254.1 4.3 80.9 14.8 0.0 0.0
64 3030.3 0.1 90.4 9.3 0.2 0.0
128 2012.5 0.0 72.2 24.3 3.5 0.0
256 1535.1 6.9 33.8 56.1 3.2 0.0
512 844.0 6.8 31.5 57.4 2.5 1.8
1024 747.5 6.7 26.6 59.1 5.8 1.8
2048 570.9 6.5 33.8 42.7 15.2 1.8
3 sweeps performed together
16 1813.4 5.2 94.7 0.0 0.0 0.0
32 2144.4 4.3 74.7 21.0 0.0 0.0
64 2894.3 0.1 83.9 15.8 0.1 0.0
128 2081.1 0.1 63.0 34.7 2.2 0.0
256 1622.0 7.0 29.3 61.6 2.1 0.0
512 1026.5 7.1 32.1 57.3 2.3 1.2
1024 815.8 6.8 30.2 53.2 8.6 1.2
2048 627.3 6.6 35.5 40.7 15.9 1.2

Runtime behaviour
Size MFlops
/sec
% of cycles used for
± Base Exec Cache DTB Branch R dep Nops
2 sweeps performed together
16 332.1 4.4 119.8 68.4 0.2 3.5 10.2 27.3 5.8
32 420.6 3.0 124.7 81.9 4.9 5.0 7.2 14.7 8.0
64 541.9 0.5 142.2 107.9 1.7 10.7 4.4 9.4 7.6
128 359.5 -0.9 121.0 85.2 5.5 26.6 0.0 0.0 4.6
256 294.3 -0.1 111.8 55.0 41.6 9.1 0.1 0.0 6.1
512 161.7 -0.2 104.7 30.8 66.0 5.0 0.1 0.0 3.0
1024 143.0 -0.1 103.4 26.6 70.5 3.8 0.0 0.0 2.6
2048 109.1 0.1 103.5 20.8 76.3 4.0 0.0 0.0 2.3
3 sweeps performed together
16 341.8 4.1 117.7 65.0 0.3 3.6 8.5 31.0 5.2
32 400.3 2.9 120.8 76.8 7.4 5.8 6.5 13.9 7.5
64 517.5 0.7 137.0 100.3 2.8 13.4 4.0 8.7 7.1
128 372.0 1.1 124.5 79.3 16.2 19.5 0.3 3.7 4.4
256 311.4 0.1 111.6 58.0 38.6 8.4 0.1 0.0 6.4
512 197.3 0.0 106.2 37.5 59.3 5.8 0.0 0.0 3.6
1024 156.3 -0.2 104.3 29.1 67.8 4.8 0.0 0.0 2.8
2048 119.9 0.0 104.1 23.5 73.2 4.8 0.0 0.0 2.6
Table explanation

cs10-dime@fau.de
Last Modified: 10 January 2008
Valid HTML 4.01! Powered by vim