banner image

Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations

logo
home
staff
coorperations
publications
talks
tutorials
software
results
contact

A cooperation
between
lss logo
and
lrr logo.

Funded by
dfg logo.

rb6.F Program Description

File Name:

rb6.F

Description:

sweep Like rb4.F and rb5.F rb6.F performs one sweep through the grid updating each node m times. However, rb6.F tries to update each node as soon as possible. Hence, if a red node in line i is updated the first time the black node in line i-1 directly underneath is also updated the first time. As a consequence the red node in line i-2 and the black node in line i-3 directly underneath can be updated the second time and so on till the red node in line i-m*2+2 and the black node in line i-m*2+1 are updated the mth time.

Comment:

The data locality behaviour of the algorithm should be a little bit better than that of rb5.F because the data in the lines i-2,...,i-2*m+1 is reused earlier. Hence, there should be a higher chance that the data is still in the cache. Also, because of the earlier reuse of the data less data must be hold in the cache simultaniosly.

Results:

Memory access behaviour
Size MBytes
/sec
% of all access which go into
± 1. Level 2. Level 3. Level Memory
2 sweeps performed together
16 1275.5 20.6 79.4 0.1 0.0 0.0
32 1251.8 20.8 77.8 1.3 0.0 0.0
64 1049.0 21.0 68.9 9.8 0.3 0.0
128 1021.0 21.1 61.7 14.4 2.7 0.0
256 910.8 21.2 30.8 45.1 2.7 0.1
512 613.7 21.1 30.6 45.0 1.5 1.8
1024 509.5 21.2 36.7 36.2 4.1 1.8
2048 380.4 20.8 26.2 38.6 12.5 1.8
3 sweeps performed together
16 1431.7 21.5 78.4 0.0 0.0 0.0
32 1374.6 22.7 60.1 17.2 0.0 0.0
64 1166.3 22.8 60.7 16.3 0.2 0.0
128 1131.5 23.5 45.4 29.1 1.9 0.0
256 1060.1 23.7 30.1 44.3 1.8 0.0
512 759.2 23.7 29.6 44.5 1.0 1.2
1024 516.5 23.5 35.9 32.7 6.7 1.2
2048 393.1 23.3 27.5 34.1 13.8 1.2

Runtime behaviour
Size MFlops
/sec
% of cycles used for
± Base Exec Cache DTB Branch R dep Nops
2 sweeps performed together
16 286.7 7.0 102.3 49.2 0.2 0.3 0.9 42.4 2.3
32 282.3 10.8 104.8 47.7 3.9 0.2 0.5 39.6 2.1
64 237.0 4.7 102.2 42.1 8.8 0.1 0.0 43.5 3.0
128 231.2 8.9 105.2 41.7 17.7 0.2 0.0 33.8 2.9
256 206.5 6.7 101.0 37.4 16.9 5.2 0.0 32.7 2.1
512 139.0 4.7 100.7 25.2 44.1 3.6 0.0 21.7 1.4
1024 115.5 4.8 100.3 21.0 45.8 4.7 0.0 23.4 0.6
2048 85.8 2.9 100.4 16.5 58.8 4.6 0.0 16.7 0.9
3 sweeps performed together
16 325.7 5.7 106.1 50.6 0.3 1.0 0.9 43.7 3.9
32 317.4 5.8 105.1 49.7 4.7 0.4 0.3 40.7 3.5
64 269.9 4.6 103.4 45.0 8.2 0.1 0.0 41.0 4.5
128 264.3 4.4 104.5 44.9 14.6 0.3 0.0 36.3 4.0
256 248.1 6.9 104.4 42.3 13.1 4.3 0.0 34.0 3.8
512 177.6 4.8 102.2 30.0 37.3 3.2 0.0 24.2 2.7
1024 120.6 5.7 100.8 21.1 46.0 4.8 0.0 22.0 1.2
2048 91.5 4.2 100.2 17.0 57.2 4.2 0.0 17.1 0.5
Table explanation

cs10-dime@fau.de
Last Modified: 10 January 2008
Valid HTML 4.01! Powered by vim