Publications/Theses
Copyright:
The copyrights of the following papers are held by
the publishers. The attached PostScript or PDF files
are preprints. Please treat this material in a way
consistent with the "fair use" provisions of the
appropriate copyright laws.
Bibtex-Support:
Download a bibtex file
of the DiME project publications and thesis.
2007 and 2008:
- M. Stürmer, H. Köstler, and
U. Rüde. A fast full multigrid solver for
applications in image processing. Numer. Linear
Algebra Appl., 15:187–200, 2008.
- Josef Weidendorfer and Carsten Trinitis.
Off-loading Application controlled Data Prefetching
in numerical Codes for Multi-Core Processors.
Int. J. High Performance Computing and
Networking, 4(1):22–28, 2008.
pdf
(preprint)
- M. Stürmer, J. Treibig, and
U. Rüde. Optimising a 3D Multigrid
Algorithm for the IA-64 Architecture.
International Journal of Computational Science
and Engineering (IJCSE), 4(1):29–35 ,
2008.
- Tobias Gradl and Ulrich Rüde. Massively
Parallel Multilevel Finite Element Solvers on the
Altix 4700. inSiDE, 5(2):24–29,
2007.
link
- C. Freundl, T. Gradl,
U. Rüde, and B. Bergen. Petascale
Computing: Algorithms and Applications, chapter
Towards Petascale Multilevel Finite Element
Solvers. Chapman & Hall/CRC, December
2007.
- M. Stürmer, J. Götz,
G. Richter, and U. Rüde. Blood Flow
Simulation on the Cell Broadband Engine using the
Lattice Boltzmann Method. Technical Report 07-9,
Lehrstuhl für Informatik 10
(Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, September 2007.
pdf
- H. Köstler, M. Stürmer,
C. Freundl, and U. Rüde. PDE based
Video Compression in Real Time. Technical Report
07-11, Lehrstuhl für Informatik 10
(Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, August 2007.
pdf
- M. Stürmer, H. Köstler, and
U. Rüde. A fast multigrid solver for
applications in image processing. Technical Report
07-6, Lehrstuhl für Informatik 10
(Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, May 2007.
pdf
- C. C. Douglas, U. Rüde,
J. Hu, and M. L. Bittencourt. A Guide to
Designing Cache Aware Multigrid Algorithms.
Technical Report 07-3, Lehrstuhl für
Informatik 10 (Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, April 2007.
pdf
2006:
- B. Bergen, T. Gradl,
F. Hülsemann, and U. Rüde. A
Massively Parallel Multigrid Method for Finite
Elements. Computing in Science and
Engineering. 8(6):56–62, December
2006.
- J. Habich. Improving computational
efficiency of Lattice Boltzmann methods on complex
geometries. Bachelor´s Thesis, December
2006.
pdf
- G. Wellein, T. Zeiser, G. Hager,
and S. Donath. On the single processor
performance of simple lattice boltzmann kernels.
computers & fluids,
35(8–9):910–919, November 2006. ISSN
0045-7930.
link
- J. Härdtlein, A. Linke, and
C. Pflaum. Blocking Techniques with Fast
Expression Templates. Technical Report 06-8,
Lehrstuhl für Informatik 10
(Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, November 2006.
pdf
- Vlasia Anagnostopoulou. Exploiting multi-core
processors for memory-bound numerical codes by
using prefetching techniques. Lehrstuhl für
Rechnertechnik und
Rechnerorganisation/Parallelrechnerarchitektur
(LRR), Informatik 10, Fakultät für
Informatik, Technische Universität
München, Germany, October 2006.
Diplomarbeit.
pdf
- M. Stürmer, J. Treibig, and
U. Rüde. Optimizing a 3D Multigrid
Algorithm for the IA-64 Architecture. In
M. Becker and H. Szczerbicka, editors,
Simulationstechnique - 19th Symposium in
Hannover, September 2006, volume 16 of
Frontiers in Simulation, pages
271–276. ASIM, SCS Publishing House,
September 2006.
pdf
- Josef Weidendorfer and Carsten Trinitis. Block
Prefetching for Numerical Codes. In Proc. of the
ASIM-06 Conf., Frontiers in Simulation. SCS,
2006.
pdf
- A. Nitsure, K. Iglberger,
U. Rüde, C. Feichtinger,
G. Wellein, and G. Hager. Optimization of
Cache Oblivious Lattice Boltzmann Method in 2D and
3D. In M. Becker and H. Szczerbicka,
editors, imulationstechnique - 19th Symposium in
Hannover, September 2006, volume 16 of
Frontiers in Simulation, pages
265–270. ASIM, SCS Publishing House,
September 2006.
pdf
- A. Nitsure. Implemenation and optimization
of a cache-oblivious Lattice Boltzmann algorithm.
Master´s thesis, Lehrstuhl für
Informatik 10 (Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, August 2006.
pdf
- J. Götz. Numerical simulation of
blood flow with lattice boltzmann methods.
Master´s thesis, Lehrstuhl für
Informatik 10 (Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, July 2006.
pdf
- Josef Weidendorfer and Carsten Trinitis. Cache
Optimizations for Iterative Numerical Codes Aware
of Hardware Prefetching. volume 3732 of Lecture
Notes in Computer Science, pages 921–927.
Springer, 2006.
pdf
- Markus Stürmer. Optimierung von
Mehrgitteralgorithmen auf der IA-64
Rechnerarchitektur. Diplomarbeit, Lehrstuhl
für Informatik 10 (Systemsimulation), Institut
für Informatik, University of
Erlangen-Nuremberg, Germany, 2006.
pdf
- J. Götz. Simulation of bloodflow in
aneurysms using the lattice boltzmann method and an
adapted data structure. Technical Report 06-6,
Lehrstuhl für Informatik 10
(Systemsimulation),
Friedrich-Alexander-Universität
Erlangen-Nürnberg, 2006.
pdf
2005:
- S. Donath, T. Zeiser, G. Hager,
J. Habich, and G. Wellein. Optimizing
Performance of the Lattice Boltzmann Method for
Complex Structures on Cache-based Architectures. In
F. Hülsemann, M. Kowarschik, and
U. Rüde, editors, 18th Symposium
Simulationstechnique ASIM 2005 Proceedings,
volume 15 of Frontiers in Simulation,
pages 728–735. ASIM, SCS Publishing House,
September 2005.
- J. Treibig, S. Hausmann, and
U. Rüde. Performance Analysis of the
Lattice Boltzmann Method on x-86-64 Architectures.
In F. Hülsemann, M. Kowarschik, and
U. Rüde, editors, 18th Symposium
Simulationstechnique ASIM 2005 Proceedings,
volume 15 of Frontiers in Simulation,
pages 736–741. ASIM, SCS Publishing House,
September 2005.
pdf
- B. Bergen. Hierarchical Hybrid Grids:
Data Structures and Core Algorithms for Efficient
Finite Element Simulations on Supercomputers.
PhD thesis, FAU Erlangen, 2005.
- Josef Weidendorfer and Carsten Trinitis.
Collecting and Exploiting Cache-Reuse Metrics. In
ICCS 2005: 5th International Conference on
Computational Science, volume 3515 of LNCS, pages
191-198. Springer, May 2005.
pdf
- Josef Weidendorfer and Carsten Trinitis.
Collecting and Exploiting Cache-Reuse Metrics. In
ICCS 2005: 5th International Conference on
Computational Science, volume 3515 of
LNCS, pages 191–198. Springer, May
2005.
- Stefan Lukowitz. Ermittlung des Quellcodebezugs
von Speicherzugriffen auf Datenobjekte. Lehrstuhl
für Rechnertechnik und
Rechnerorganisation/Parallelrechnerarchitektur
(LRR), Informatik 10, Fakultät für
Informatik, Technische Universität
München, Germany, May 2005. Diplomarbeit.
pdf
- B. Bergen, F. Hülsemann, and
U. Rüde. Is 1.7×1010
Unknowns the Largest Finite Element System that Can
Be Solved Today? In SC ´05: Proceedings of
the 2005 ACM/IEEE conference on Supercomputing,
Washington, DC, USA, 2005. IEEE Computer
Society.
link
- Markus Stürmer. Optimierung des
Red-Black-Gauss-Seidel-Verfahrens auf
ausgewählten x86-Prozessoren. Studienarbeit,
Lehrstuhl für Informatik 10
(Systemsimulation), Institut für Informatik,
University of Erlangen-Nuremberg, Germany,
2005.
pdf
- Simon Hausmann. Optimization an Performance
Analysis of the Lattice Boltzmann Method on x86-64
based Architectures. Bachelor Thesis, Lehrstuhl
für Informatik 10 (Systemsimulation), Institut
für Informatik, University of
Erlangen-Nuremberg, Germany, 2005.
pdf
- G. Hager, T. Zeiser, J. Treibig,
and G. Wellein. Optimizing Performance on
Modern HPC Systems: Learning From Simple Kernel
Benchmarks. Conference Paper, March 2005. 2nd
Russian-German Advanced Research Workshop on
Computational Science and High Performance
Computing, German-Russian Center for Computational
Technologies and High Performance Computing,
Stuttgart, 16.03.2005.
2004:
- T. Pohl, N. Thürey,
F. Deserno, U. Rüde,
P. Lammers, G. Wellein, and
T. Zeiser. Performance Evaluation of Parallel
Large-Scale Lattice Boltzmann Applications on Three
Supercomputing Architectures. November 2004.
Supercomputing Conference 04.
- S. Donath. On Optimized Implementations of
the Lattice Boltzmann Method on Contemporary
Architectures. Bachelor´s Thesis, August
2004.
pdf
- Dietrich Christopheit. Vervollständigung
von Profiling-Messungen durch Kombination.
Lehrstuhl für Rechnertechnik und
Rechnerorganisation/Parallelrechnerarchitektur
(LRR), Informatik 10, Fakultät für
Informatik, Technische Universität
München, Germany, August 2004.
Diplomarbeit.
pdf
- Markus Kowarschik. Data Locality Optimizations
for Iterative Numerical Algorithms and Cellular
Automata on Hierarchical Memory Architectures. PhD
thesis. July 2004, SCS Publishing House, Germany.
ISBN 3-936150-39-7.
ps.gz,
pdf
- Stefan Hamerl. Entwicklung eines
komponentenbasierten Designs zur interaktiven
Visualisierung und Steuerung von
Profiling-Messungen. Lehrstuhl für
Rechnertechnik und
Rechnerorganisation/Parallelrechnerarchitektur
(LRR), Informatik 10, Fakultät für
Informatik, Technische Universität
München, Germany, July 2004. Diplomarbeit.
pdf
- Manfred Hauser. Assemblerbasierte Optimierungen
für EPIC- und CISC-Architekturen bei
iterativen numerischen Codes. Lehrstuhl für
Rechnertechnik und
Rechnerorganisation/Parallelrechnerarchitektur
(LRR), Informatik 10, Fakultät für
Informatik, Technische Universität
München, Germany, July 2004. Diplomarbeit.
pdf
- Josef Weidendorfer and Carsten Trinitis. Cache
Optimization for Iterative Numerical Codes Aware of
Hardware Prefetching. International Conference on
Applied Parallel Computing (PARA 04), Copenhagen,
Denmark, June 2004.
- Markus Kowarschik, Iris Christadler and Ulrich
Rüde. Towards Cache-Optimized Multigrid Using
Patch-Adaptive Relaxation. In /Proceedings of the
2004 Conference on Applied Parallel Computing
(PARA'04)/, Copenhagen, Denmark, June 2004. Lecture
Notes in Computer Science (LNCS), Springer.
ps.gz
- Josef Weidendorfer, Markus Kowarschik, and
Carsten Trinitis. A Tool Suite for Simulation Based
Analysis of Memory Access Behavior. In
Proceedings of the 2004 International
Conference on Computational Science, Krakow,
Poland, June 2004. Lecture Notes in Computer
Science (LNCS), vol. 3038, Springer.
pdf
2003:
- Jan Treibig et al. Performance Analysis of the
Lattice Boltzmann Method on x86-64 Architectures.
In Proceedings of the ASIM-05 Conference, volume
2790 of Frontiers in Simulation, pages 441-450.
SCS, 2003.
pdf
- Markus Kowarschik and Christian Weiß. An
Overview of Cache Optimization Techniques and
Cache-Aware Numerical Algorithms. Proceedings
of the GI-Dagstuhl Forschungseminar: Algorithms for
Memory Hierarchies, Lecture Notes in Computer
Science (LNCS), Vol. 2625, Springer.
ps.gz
- Jens Wilke. Cache Optimizations for the Lattice
Boltzmann Method in 2D. Studienarbeit, Lehrstuhl
für Informatik 10 (Systemsimulation), Institut
für Informatik, University of
Erlangen-Nuremberg, Germany, 2003.
ps.gz
- Jens Wilke, Thomas Pohl, Markus Kowarschik, and
Ulrich Rüde. Cache Performance Optimizations
for Parallel Lattice Boltzmann Codes in 2D.
Technical Report 03-3, Lehrstuhl für
Informatik 10 (Systemsimulation), University of
Erlangen-Nuremberg, Germany, 2001.
pdf
- Jens Wilke, Thomas Pohl, Markus Kowarschik, and
Ulrich Rüde. Cache Performance Optimizations
for Parallel Lattice Boltzmann Codes.
Proceedings of the Euro-Par '03
Conference, Klagenfurt, Austria, August 2003.
Lecture Notes in Computer Science (LNCS), vol.
2790, Springer. Joint work between our project and
the FreeWiHR project.
ps.gz
- Klaus Iglberger. Cache Optimizations for the
Lattice Boltzmann Method in 3D. Bachelor Thesis,
Lehrstuhl für Informatik 10
(Systemsimulation), Institut für Informatik,
University of Erlangen-Nuremberg, Germany,
2003.
ps.gz
- Thomas Pohl, Markus Kowarschik, Jens Wilke,
Klaus Iglberger, and Ulrich Rüde. Optimization
and Profiling of the Cache Performance of Parallel
Lattice Boltzmann Codes in 2D and 3D. Technical
Report 03-8, Lehrstuhl für Informatik 10
(Systemsimulation), University of
Erlangen-Nuremberg, Germany, 2003.
pdf
- Thomas Pohl, Markus Kowarschik, Jens Wilke,
Klaus Iglberger, and Ulrich Rüde. Optimization
and Profiling of the Cache Performance of Parallel
Lattice Boltzmann Codes. Joint work between our
project and the FreeWiHR project. Parallel
Processing Letters, Vol. 13, No. 4 (2003) pp.
549-560.
- Iris Christadler. Patch-adaptive Relaxation als
Glätter im Mehrgitterverfahren. Studienarbeit,
Lehrstuhl für Informatik 10
(Systemsimulation), Institut für Informatik,
University of Erlangen-Nuremberg, Germany,
2003.
ps.gz
2002:
- Christian Weiß, Hermann Hellwagner, Linda
Stals, and Ulrich Rüde. Data Locality
Optimizations to Improve The Efficiency of
Multigrid Methods. Technical Report 02-1, Lehrstuhl
für Informatik 10 (Systemsimulation),
University of Erlangen-Nuremberg, Germany,
2002.
Presented at the 14th Gamm-Seminar on Concepts
of Numerical Software in Kiel, Germany, Jan
23-25, 1998.
ps.gz
- Nils Thürey. Cache Optimizations for
Multigrid in 3D. Studienarbeit, Lehrstuhl für
Informatik 10 (Systemsimulation), Institut für
Informatik, University of Erlangen-Nuremberg,
Germany, 2002.
ps.gz
- Markus Kowarschik, Christian Weiß, and
Ulrich Rüde. Data Layout Optimizations for
Variable Coefficient Multigrid. In Proceedings
of the 2002 International Conference on
Computational Science, Amsterdam, The
Netherlands, April 2002. Lecture Notes in Computer
Science (LNCS), vol. 2331, Springer.
ps.gz
- Markus Kowarschik, Ulrich Rüde, Nils
Thürey, and Christian Weiß. Performance
Optimization of 3D Multigrid on Hierarchical Memory
Architectures. In Proceedings of the 2002
Conference on Applied Parallel Computing
(PARA'02), Espoo, Finland, June 2002. Lecture
Notes in Computer Science (LNCS), vol. 2367,
Springer.
ps.gz
- Markus Kowarschik and Christian Weiß.
Cache Performance Tuning of Numerically Intensive
Codes. Technical Report 02-2, Lehrstuhl für
Informatik 10 (Systemsimulation), University of
Erlangen-Nuremberg, Germany, 2002.
ps.gz
2001:
- W. Karl, M. Kowarschik, U. Rüde, and C.
Weiß. DiMEPACK: A Cache-Aware Multigrid
Library: User Manual. Technical Report 01-1,
Lehrstuhl für Informatik 10
(Systemsimulation), University of
Erlangen-Nuremberg, Germany, 2001.
ps.gz
- Markus Kowarschik and Christian Weiß.
DiMEPACK - A Cache-Optimized Multigrid Library. In
H.R. Arabnia, editor, Proceedings of
International Conference on Parallel and
Distributed Processing Techniques and Applications
(PDPTA 2001), volume I, Las Vegas, Nevada,
USA, June, 2001. CSREA, CSREA Press.
ps.gz
- Christian Weiß. Data Locality
Optimizations for Multigrid Methods on Structured
Grids, PhD thesis, Lehrstuhl für
Rechnertechnik und Rechenrorganisation, Institut
für Informatik, Technische Universität
München, Munich, Germany, December 2001.
ps.gz
- V. Daum. Runtime-Adaptive Techniques for
Multigrid Methods. Studienarbeit, Department of
Computer Science, University of Erlangen-Nuremberg,
Germany, 2001
ps.gz
2000:
- Craig C. Douglas, Jonathan Hu, Markus
Kowarschik, Ulrich Rüde, and Christian
Weiß. Cache Optimization for Structured and
Unstructured Grid Multigrid. Electronic
Transactions on Numerical Analysis (ETNA),
10:21-40, February 2000.
ps.gz
- Craig C. Douglas, Jonathan Hu, Wolfgang Karl,
Markus Kowarschik, Ulrich Rüde, and Christian
Weiß. Fixed and Adaptive Cache Aware
Algorithms for Multigrid Methods. In Multigrid
VI. Proceedings of the Sixth European Multigrid
Conference held in Gent, Belgium, September 27-30,
1999, Lecture Notes in Computational Science
and Engineering (LNCSE), Vol. 14, July 2000.
ps.gz
- Markus Kowarschik, Ulrich Rüde, Christian
Weiß, and Wolfgang Karl. Cache-Aware
Multigrid Methods for Solving Poisson's Equation in
Two Dimensions. Computing, 64 (2000), pp.
381-399.
ps.gz
- Craig C. Douglas, Jonathan Hu, Mohamed
Iskandarani, Markus Kowarschik, Ulrich Rüde,
and Christian Weiß. Maximizing Cache Memory
Usage for Multigrid Algorithms for Applications of
Fluid Flow in Porous Media. In Z. Chen, R.E. Ewing,
and Z.-C. Shi (editors), Numerical Treatment of
Multiphase Flows and Transport in Porous Media.
Proceedings of the International Workshop Held at
Beijing, China, August 2-6, 1999, Lecture
Notes in Physics, pp. 124ff. Springer, August
2000.
ps.gz
- Craig C. Douglas, Gundolf Haase, Jonathan Hu,
Markus Kowarschik, Ulrich Rüde, and Christian
Weiß. Portable Memory Hierarchy Techniques
For PDE Solvers: Part I. Siam News, 33(5),
June 2000.
ps.gz
- Craig C. Douglas, Gundolf Haase, Jonathan Hu,
Markus Kowarschik, Ulrich Rüde, and Christian
Weiß. Portable Memory Hierarchy Techniques
For PDE Solvers: Part II. Siam News,
33(6), July 2000.
ps.gz
- H. Pfänder. Cache optimierte
Mehrgitterverfahren mit variablen Koeffizienten auf
strukturierten Gittern. Diplomarbeit, Department of
Computer Science, University of Erlangen-Nuremberg,
Germany, 2000.
ps.gz
- M. Zetlmeisl. Performance Optimization of
Numerically Intensive Codes - A Case Study From
Biomedical Engineering. Studienarbeit, Department
of Computer Science, University of
Erlangen-Nuremberg, Germany, 2000.
1999
- H. Wörndl-Aichriedler. Adaptive
Mehrgitterverfahren in Raum und Zeit. Diplomarbeit,
Department of Computer Science, University of
Erlangen-Nuremberg, Germany, 1999.
- Christian Weiß, Wolfgang Karl, Markus
Kowarschik, Ulrich Rüde. Memory
Characteristics of Iterative Methods. In
Proceedings of the Supercomputing
Conference, Portland, Oregon, November
1999.
ps.gz
1998:
- Ulrich Rüde. Technological Trends and
their Impact on the Future of Supercomputing. In
H.-J. Bungartz, F. Durst, and C. Zenger (editors),
High Performance Scientific and Engineering
Computing, Proceedings of the International
FORTWIHR Conference on HPSEC, Lecture Notes in
Computational Science and Engineering (LNCSE), Vol.
8, pages 459-471. Springer, March 1998.
ps.gz
1997
- Ulrich Rüde. Iterative Algorithms on High
Performance Architectures. In Proceedings of
the EuroPar97 Conference, Lecture Notes in
Computer Science (LNCS), pages 26-29. Springer,
August 1997.
ps.gz
- Linda Stals, Ulrich Rüde. Techniques for
Improving the Data Locality of Iterative Methods.
Technical Report MRR97-038, School of Mathematical
Sciences, Australian National University, October
1997.
ps.gz
- Linda Stals, Ulrich Rüde, Christian
Weiß, and Hermann Hellwagner. Data Local
Iterative Methods for the Efficient Solution of
Partial Differential Equations. In Proceedings
of the The Eighth Biennial Computational Techniques
and Applications Conference, Adelaide,
Australia, September 1997.
ps.gz
|