SparseBench
Table of Contents
Introduction#
A hybrid MPI+OpenMP parallel sparse solver benchmark collection for evaluating the performance of iterative sparse linear solvers and sparse matrix-vector operations.
Overview#
SparseBench provides a comprehensive benchmarking suite for sparse matrix computations with support for multiple matrix formats and iterative solvers. It is designed to measure performance on both single-node and distributed memory systems.
Supported Sparse Matrix Formats#
SparseBench supports three different sparse matrix storage formats, each with different performance characteristics:
- CRS (Compressed Row Storage): Standard row-compressed format with three arrays (row pointers, column indices, values). Good general-purpose format.
- SCS (Sell-C-Sigma): SELL-C-σ format optimized for vectorization and cache efficiency. Particularly effective on modern CPUs with SIMD instructions.
- CCRS (Compressed CRS): Compressed variant of CRS format for improved memory efficiency.
Benchmark Types#
The following benchmark types are available:
- CG: Conjugate Gradient iterative solver for symmetric positive definite systems.
- SPMV: Sparse Matrix-Vector Multiplication (SpMV) kernel benchmark.
- GMRES: Generalized Minimal Residual method for general sparse systems (planned).
- CHEBFD: Chebyshev Filter Diagonalization (planned).
MPI Communication Algorithm#
For distributed memory execution with MPI, SparseBench uses an optimized communication algorithm that reduces communication to a single exchange operation per iteration. See MPI-Algorithm.md for a detailed explanation of the localization process and communication strategy.
Getting Started#
To build and run SparseBench, you need a C compiler and GNU make. MPI is optional but required for distributed memory benchmarks.
-
Install a supported compiler
- GCC
- Clang
- Intel ICX
-
Clone the repository
git clone <repository-url> cd SparseBench -
(Optional) Adjust configuration
On first run,
makewill copymk/config-default.mktoconfig.mk. Editconfig.mkto change the toolchain, matrix format, enable MPI/OpenMP, etc. -
Build
makeSee the full Build section for more details.
-
Usage
./sparseBench-<TOOLCHAIN> [options]See the full Usage section for more details.
Get help on command-line arguments:
./sparseBench-<TOOLCHAIN> -h
Build#
Configuration#
Configure the toolchain and additional options in config.mk:
# Supported: GCC, CLANG, ICC
TOOLCHAIN ?= CLANG
# Supported: CRS, SCS, CCRS
MTX_FMT ?= CRS
ENABLE_MPI ?= true
ENABLE_OPENMP ?= false
FLOAT_TYPE ?= DP # SP for float, DP for double
UINT_TYPE ?= U # U for unsigned int, ULL for unsigned long long int
# Feature options
OPTIONS += -DARRAY_ALIGNMENT=64
OPTIONS += -DOMP_SCHEDULE=static
#OPTIONS += -DVERBOSE
#OPTIONS += -DVERBOSE_AFFINITY
#OPTIONS += -DVERBOSE_DATASIZE
#OPTIONS += -DVERBOSE_TIMER
Configuration Options#
- TOOLCHAIN: Compiler to use (GCC, CLANG, ICC)
- MTX_FMT: Sparse matrix storage format (CRS, SCS, CCRS)
- ENABLE_MPI: Enable MPI for distributed memory execution
- ENABLE_OPENMP: Enable OpenMP for shared memory parallelism
- FLOAT_TYPE:
SP: Single precision (float)DP: Double precision (double)
- UINT_TYPE:
U: Unsigned int for matrix indicesULL: Unsigned long long int for very large matrices
Verbosity Options#
The verbosity options enable detailed diagnostic output:
-DVERBOSE: General verbose output-DVERBOSE_AFFINITY: Print thread affinity settings and processor bindings-DVERBOSE_DATASIZE: Print detailed memory allocation sizes-DVERBOSE_TIMER: Print timer resolution information
Build Commands#
Build with:
make
You can build multiple toolchains in the same directory, but the Makefile
only acts on the currently configured one. Intermediate build results are
located in the ./build/<TOOLCHAIN> directory.
To show all executed commands use:
make Q=
Clean up intermediate build results for the current toolchain with:
make clean
Clean up all build results for all toolchains with:
make distclean
Optional Targets#
Generate assembler files:
make asm
The assembler files will be located in the ./build/<TOOLCHAIN> directory.
Reformat all source files using clang-format (only works if clang-format
is in your path):
make format
Usage#
To run the benchmark, call:
./sparseBench-<TOOLCHAIN> [options]
Command Line Arguments#
| Option | Argument | Description |
|---|---|---|
-h |
— | Show help text. |
-f |
<parameter file> |
Load options from a parameter file. |
-m |
<MM matrix> |
Load a Matrix Market (.mtx) file. |
-c |
<file name> |
Convert a Matrix Market file to binary matrix format (.bmx). |
-t |
<bench type> |
Benchmark type: cg, spmv, or gmres. Default: cg. |
-x |
<int> |
Size in x dimension for generated matrix (ignored if loading file). Default: 100. |
-y |
<int> |
Size in y dimension for generated matrix (ignored if loading file). Default: 100. |
-z |
<int> |
Size in z dimension for generated matrix (ignored if loading file). Default: 100. |
-i |
<int> |
Number of solver iterations. Default: 150. |
-e |
<float> |
Convergence criteria epsilon. Default: 0.0. |
Matrix Input#
SparseBench supports multiple ways to provide input matrices:
-
Generated matrices: Use default or specify dimensions with
-x,-y,-z:- Default mode generates a 3D 7-point stencil matrix
- Use
-m generate7Pfor explicit 7-point stencil generation
-
Matrix Market files (
.mtx): Load standard MatrixMarket format:./sparseBench-GCC -m matrix.mtx -
Binary matrix files (
.bmx): Load pre-converted binary format (MPI builds only):./sparseBench-GCC -m matrix.bmxConvert Matrix Market to binary format:
./sparseBench-GCC -c matrix.mtx
Example Usage#
Run CG solver with generated 100×100×100 matrix:
./sparseBench-GCC
Run CG solver with custom dimensions:
./sparseBench-GCC -x 200 -y 200 -z 200
Run SpMV benchmark with Matrix Market file:
./sparseBench-GCC -t spmv -m matrix.mtx -i 1000
Run GMRES solver with convergence criteria:
./sparseBench-GCC -t gmres -m matrix.mtx -e 1e-6
Run with MPI (4 processes):
mpirun -np 4 ./sparseBench-GCC -m matrix.mtx
MPI Algorithm#
SparseBench uses an optimized MPI communication algorithm for distributed sparse matrix computations. The implementation uses a localization process that transforms the distributed matrix to enable efficient communication:
- All column indices are converted to local indices
- External elements are appended to local vectors
- Communication is reduced to a single
MPI_Neighbor_alltoallvcall per iteration - Elements from the same source rank are stored consecutively
For a detailed explanation of the algorithm, including the four-step localization process (identify externals, build graph topology, reorder externals, build global index list), see MPI-Algorithm.md.
Testing#
The tests directory contains unit tests for various components:
- Matrix format tests
- Solver validation tests
- Communication tests (MPI builds)
To run all tests:
cd tests
make
./runTests
Thread Pinning#
For OpenMP execution, it is recommended to control thread affinity for optimal
performance. We recommend using likwid-pin to set the number of threads and
control thread affinity:
likwid-pin -C 0-7 ./sparseBench-GCC
If likwid-pin is not available, you can use standard OpenMP environment
variables:
export OMP_NUM_THREADS=8
export OMP_PROC_BIND=close
export OMP_PLACES=cores
./sparseBench-GCC
Support for clangd Language Server#
The Makefile will generate a .clangd configuration to correctly set all
options for the clang language server. This is only important if you use an
editor with LSP support and want to edit or explore the source code.
It is required to use GNU Make 4.0 or newer. While older make versions will
work, the generation of the .clangd configuration for the clang language
server will not work. The default Make version included in MacOS is 3.81!
Newer make versions can be easily installed on MacOS using the
Homebrew package manager.
An alternative is to use Bear, a tool that generates a compilation database for clang tooling. This method also enables jumping to any definition without previously opened buffer. You have to build SparseBench one time with Bear as a wrapper:
bear -- make
License#
Copyright © NHR@FAU, University Erlangen-Nuremberg.
This project is licensed under the MIT License - see the LICENSE file for details.