KASKADE 7 development version
|
A NUMA-aware creator for matrix sparsity patterns. More...
#include <threadedMatrix.hh>
A NUMA-aware creator for matrix sparsity patterns.
This supports a two-phase creation of sparsity pattern. Nonzero entry positions can be added to the creator at will. At any time, a NumaCRSPattern sparsity pattern can be constructed efficiently by supplying this creator.
Index | the integral type used for row/column indices (defaults to size_t) |
Definition at line 1616 of file threadedMatrix.hh.
Public Member Functions | |
NumaCRSPatternCreator (Index rows_, Index cols, bool symmetric=false, int nnzPerRow=0) | |
Constructs a rows times cols matrix sparsity structure creator. More... | |
~NumaCRSPatternCreator () | |
void | addElement (Index row, Index col) |
Enters a single entry into the sparsity pattern. More... | |
void | addDenseBlock (Index fromRow, Index toRow, Index fromCol, Index toCol) |
Enters a contiguous dense block into the sparsity pattern. More... | |
template<class IterRow , class IterCol > | |
void | addElements (IterRow const fromRow, IterRow const toRow, IterCol const fromCol, IterCol const toCol, bool colIsSorted=false) |
Enters entries into the sparsity pattern. More... | |
template<class RowRangeSequence , class ColRangeSequence > | |
void | addElements (RowRangeSequence const &rrs, ColRangeSequence const &crs, bool colsAreSorted=false) |
Enters elements into the sparsity pattern. More... | |
void | addAllElements () |
Enters all possible elements (defining a dense matrix). More... | |
void | addDiagonal () |
Enters the diagonal elements. More... | |
size_t | nonzeroes () const |
The number of structurally nonzero elements. More... | |
Index | cols () const |
The number of columns. More... | |
void | balance () |
Redistributes the rows to the NUMA chunks in order to have the same number of entries in each chunk. More... | |
int | nodes () const |
Returns the number of NUMA nodes/chunks used. More... | |
ChunkCreator const & | creator (int node) const |
Returns the chunk creator. More... | |
bool | isSymmetric () const |
Returns the symmetry status of the pattern. More... | |
|
inline |
Constructs a rows times cols matrix sparsity structure creator.
rows | the number of rows. |
cols | the number of columns. |
symmetric | if true, the matrix is symmetric and only the lower triangular part is actually stored (only for square matrices). |
nnzPerRow | hint for the expected number of nonzeroes per row |
The rows are distributed evenly among the NUMA nodes in case of unsymmetric patterns, and roughly proportional to \( i^{-1/2} \) for symmetric ones. This balances the number of elements in each chunk if there is the same number of elements in each row.
Providing a sufficiently large nnzPerRow hint can reduce the number of memory reallocations performed during element insertion and hence speed up the insertion significantly (a factor of 40 has been observed!) at the cost of potentially increased memory footprint. A guideline to what is "sufficiently large": The memory buffer of a row needs to temporarily store the number of entries in this row plus the entries that are inserted by the addElements operation to the row. E.g., for 2D linear finite elements on a triangular grid, this would be 7 (number of entries per row in a regular grid, say 8 or 9 as a precaution for unstructured grids) + 3 (elemental matrices are 3x3, so 3 elements are added by an addElements operation), in total 10 to 12. If memory consumption is not the prime concern, err on the larger side.
The effect need not be present, and in fact, specifying a nonzero nnzPerRow may actually be slower since all chunks request memory at the same time and put a high load on the memory management system. This has been observed to parallelize rather badly.
In the end, choose a value based on profiling your application.
Definition at line 1652 of file threadedMatrix.hh.
|
inline |
Definition at line 1674 of file threadedMatrix.hh.
|
inline |
Enters all possible elements (defining a dense matrix).
Sometimes (e.g. with spatially constant variables), FE matrices are actually dense. In this case we need not create a sparsity pattern, but can simplify the process.
Definition at line 1770 of file threadedMatrix.hh.
|
inline |
Enters a contiguous dense block into the sparsity pattern.
fromRow | first row in the block |
toRow | one behind last row in the block |
fromCol | first column in the block |
toCol | one behind last column in the block |
Definition at line 1701 of file threadedMatrix.hh.
Referenced by Kaskade::BDDC::KKTSolver< Domain >::KKTSolver().
|
inline |
Enters the diagonal elements.
Definition at line 1780 of file threadedMatrix.hh.
Referenced by Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation().
|
inline |
Enters a single entry into the sparsity pattern.
Note that entering individual entries has a significant overhead. Use one of the other methods if possible.
Definition at line 1688 of file threadedMatrix.hh.
Referenced by Kaskade::NumaCRSPatternCreator< Index >::addDenseBlock(), Kaskade::NumaCRSPatternCreator< Index >::addDiagonal(), Kaskade::NumaBCRSMatrix< Entry, Index >::diagcat(), Kaskade::MGProlongation::galerkinProjection(), Kaskade::getContactConstraints(), Kaskade::NumaBCRSMatrix< Entry, Index >::horzcat(), Kaskade::nonNestedProlongation(), Kaskade::operator+(), Kaskade::prolongation(), Kaskade::NumaBCRSMatrix< Entry, Index >::reshapeBlocks(), Kaskade::sparseUnitMatrix(), Kaskade::twoGridProlongation(), and Kaskade::NumaBCRSMatrix< Entry, Index >::vertcat().
|
inline |
Enters entries into the sparsity pattern.
IterRow | random access iterator for a range of row indices |
IterCol | random access iterator for a range of column indices |
fromRow | start of row indices |
toRow | one behind of last row index |
fromCol | start of sorted column indices |
toCol | one behind of last column index |
colIsSorted | true if the column index range is sorted ascendingly |
All entries \( (i,j) \) with \( i \) in the given row indices and \( j \) in the given column indices are entered into the sparsity structure. Note that the column indices have to be sorted. No index in the row range shall occur twice.
Definition at line 1726 of file threadedMatrix.hh.
Referenced by Kaskade::NumaCRSPatternCreator< Index >::addElement(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::MGProlongation::asMatrix(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::MGProlongation::galerkinProjection(), Kaskade::insertMatrixBlockIndices(), and Kaskade::NumaBCRSMatrix< Entry, Index >::operator*().
|
inline |
Enters elements into the sparsity pattern.
RowRangeSequence | a sequence type with value type representing a row index range |
ColRangeSequence | a sequence type with value type representing a column index range |
rrs | the sequence of row ranges. |
crs | the sequence of column ranges. The size has to be the same as that of rrs. |
colsAreSorted | true if all the column index ranges are sorted ascendingly |
For each corresponding pair of row and column ranges in the sequences
The insertion of indices is done in parallel on the NUMA chunks. In order to maintain efficiency, the number or size of the ranges provided in the sequences should not be too small (a couple of ten ranges should be ok).
Definition at line 1749 of file threadedMatrix.hh.
void Kaskade::NumaCRSPatternCreator< Index >::balance | ( | ) |
Redistributes the rows to the NUMA chunks in order to have the same number of entries in each chunk.
The association of matrix rows to chunks is recalculated in order to have approximately the same number of nonzeroes in every chunk. While this is not an extremely expensive operation, it does several reallocations. Call this once before creating a sparsity pattern.
|
inline |
The number of columns.
Definition at line 1802 of file threadedMatrix.hh.
Referenced by Kaskade::insertMatrixBlockIndices(), and Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator().
|
inline |
Returns the chunk creator.
node | the number of the NUMA node / chunk (0<=node<nodes()). |
Definition at line 1824 of file threadedMatrix.hh.
Referenced by Kaskade::NumaCRSPattern< Index >::NumaCRSPattern().
|
inline |
Returns the symmetry status of the pattern.
If true, the matrix is symmetric, and only the lower triangular entries are actually stored.
Definition at line 1831 of file threadedMatrix.hh.
Referenced by Kaskade::insertMatrixBlockIndices().
|
inline |
Returns the number of NUMA nodes/chunks used.
This is guaranteed not to exceed the number of NUMA nodes, but can be less.
Definition at line 1818 of file threadedMatrix.hh.
Referenced by Kaskade::NumaCRSPatternCreator< Index >::addAllElements(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::NumaCRSPattern< Index >::NumaCRSPattern(), Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator(), and Kaskade::NumaCRSPatternCreator< Index >::~NumaCRSPatternCreator().
|
inline |
The number of structurally nonzero elements.
For symmetric storage without superdiagonal elements stored, this counts subdiagonal elements twice.
Definition at line 1791 of file threadedMatrix.hh.