KASKADE 7 development version
Public Member Functions | List of all members
Kaskade::NumaCRSPatternCreator< Index > Class Template Reference

A NUMA-aware creator for matrix sparsity patterns. More...

#include <threadedMatrix.hh>

Detailed Description

template<class Index = size_t>
class Kaskade::NumaCRSPatternCreator< Index >

A NUMA-aware creator for matrix sparsity patterns.

This supports a two-phase creation of sparsity pattern. Nonzero entry positions can be added to the creator at will. At any time, a NumaCRSPattern sparsity pattern can be constructed efficiently by supplying this creator.

Template Parameters
Indexthe integral type used for row/column indices (defaults to size_t)

Definition at line 1616 of file threadedMatrix.hh.

Public Member Functions

 NumaCRSPatternCreator (Index rows_, Index cols, bool symmetric=false, int nnzPerRow=0)
 Constructs a rows times cols matrix sparsity structure creator. More...
 
 ~NumaCRSPatternCreator ()
 
void addElement (Index row, Index col)
 Enters a single entry into the sparsity pattern. More...
 
void addDenseBlock (Index fromRow, Index toRow, Index fromCol, Index toCol)
 Enters a contiguous dense block into the sparsity pattern. More...
 
template<class IterRow , class IterCol >
void addElements (IterRow const fromRow, IterRow const toRow, IterCol const fromCol, IterCol const toCol, bool colIsSorted=false)
 Enters entries into the sparsity pattern. More...
 
template<class RowRangeSequence , class ColRangeSequence >
void addElements (RowRangeSequence const &rrs, ColRangeSequence const &crs, bool colsAreSorted=false)
 Enters elements into the sparsity pattern. More...
 
void addAllElements ()
 Enters all possible elements (defining a dense matrix). More...
 
void addDiagonal ()
 Enters the diagonal elements. More...
 
size_t nonzeroes () const
 The number of structurally nonzero elements. More...
 
Index cols () const
 The number of columns. More...
 
void balance ()
 Redistributes the rows to the NUMA chunks in order to have the same number of entries in each chunk. More...
 
int nodes () const
 Returns the number of NUMA nodes/chunks used. More...
 
ChunkCreator const & creator (int node) const
 Returns the chunk creator. More...
 
bool isSymmetric () const
 Returns the symmetry status of the pattern. More...
 

Constructor & Destructor Documentation

◆ NumaCRSPatternCreator()

template<class Index = size_t>
Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator ( Index  rows_,
Index  cols,
bool  symmetric = false,
int  nnzPerRow = 0 
)
inline

Constructs a rows times cols matrix sparsity structure creator.

Parameters
rowsthe number of rows.
colsthe number of columns.
symmetricif true, the matrix is symmetric and only the lower triangular part is actually stored (only for square matrices).
nnzPerRowhint for the expected number of nonzeroes per row

The rows are distributed evenly among the NUMA nodes in case of unsymmetric patterns, and roughly proportional to \( i^{-1/2} \) for symmetric ones. This balances the number of elements in each chunk if there is the same number of elements in each row.

Providing a sufficiently large nnzPerRow hint can reduce the number of memory reallocations performed during element insertion and hence speed up the insertion significantly (a factor of 40 has been observed!) at the cost of potentially increased memory footprint. A guideline to what is "sufficiently large": The memory buffer of a row needs to temporarily store the number of entries in this row plus the entries that are inserted by the addElements operation to the row. E.g., for 2D linear finite elements on a triangular grid, this would be 7 (number of entries per row in a regular grid, say 8 or 9 as a precaution for unstructured grids) + 3 (elemental matrices are 3x3, so 3 elements are added by an addElements operation), in total 10 to 12. If memory consumption is not the prime concern, err on the larger side.

The effect need not be present, and in fact, specifying a nonzero nnzPerRow may actually be slower since all chunks request memory at the same time and put a high load on the memory management system. This has been observed to parallelize rather badly.

In the end, choose a value based on profiling your application.

Warning
Memory usage may be exhaustive if there are few essentially dense rows into which entries are added alternatingly with entries into many sparse rows!

Definition at line 1652 of file threadedMatrix.hh.

◆ ~NumaCRSPatternCreator()

template<class Index = size_t>
Kaskade::NumaCRSPatternCreator< Index >::~NumaCRSPatternCreator ( )
inline

Definition at line 1674 of file threadedMatrix.hh.

Member Function Documentation

◆ addAllElements()

template<class Index = size_t>
void Kaskade::NumaCRSPatternCreator< Index >::addAllElements ( )
inline

Enters all possible elements (defining a dense matrix).

Sometimes (e.g. with spatially constant variables), FE matrices are actually dense. In this case we need not create a sparsity pattern, but can simplify the process.

Todo:
This should be statically encoded in a different matrix type.

Definition at line 1770 of file threadedMatrix.hh.

◆ addDenseBlock()

template<class Index = size_t>
void Kaskade::NumaCRSPatternCreator< Index >::addDenseBlock ( Index  fromRow,
Index  toRow,
Index  fromCol,
Index  toCol 
)
inline

Enters a contiguous dense block into the sparsity pattern.

Parameters
fromRowfirst row in the block
toRowone behind last row in the block
fromColfirst column in the block
toColone behind last column in the block

Definition at line 1701 of file threadedMatrix.hh.

Referenced by Kaskade::BDDC::KKTSolver< Domain >::KKTSolver().

◆ addDiagonal()

template<class Index = size_t>
void Kaskade::NumaCRSPatternCreator< Index >::addDiagonal ( )
inline

Enters the diagonal elements.

Definition at line 1780 of file threadedMatrix.hh.

Referenced by Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation().

◆ addElement()

template<class Index = size_t>
void Kaskade::NumaCRSPatternCreator< Index >::addElement ( Index  row,
Index  col 
)
inline

◆ addElements() [1/2]

template<class Index = size_t>
template<class IterRow , class IterCol >
void Kaskade::NumaCRSPatternCreator< Index >::addElements ( IterRow const  fromRow,
IterRow const  toRow,
IterCol const  fromCol,
IterCol const  toCol,
bool  colIsSorted = false 
)
inline

Enters entries into the sparsity pattern.

Template Parameters
IterRowrandom access iterator for a range of row indices
IterColrandom access iterator for a range of column indices
Parameters
fromRowstart of row indices
toRowone behind of last row index
fromColstart of sorted column indices
toColone behind of last column index
colIsSortedtrue if the column index range is sorted ascendingly

All entries \( (i,j) \) with \( i \) in the given row indices and \( j \) in the given column indices are entered into the sparsity structure. Note that the column indices have to be sorted. No index in the row range shall occur twice.

Definition at line 1726 of file threadedMatrix.hh.

Referenced by Kaskade::NumaCRSPatternCreator< Index >::addElement(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::MGProlongation::asMatrix(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::MGProlongation::galerkinProjection(), Kaskade::insertMatrixBlockIndices(), and Kaskade::NumaBCRSMatrix< Entry, Index >::operator*().

◆ addElements() [2/2]

template<class Index = size_t>
template<class RowRangeSequence , class ColRangeSequence >
void Kaskade::NumaCRSPatternCreator< Index >::addElements ( RowRangeSequence const &  rrs,
ColRangeSequence const &  crs,
bool  colsAreSorted = false 
)
inline

Enters elements into the sparsity pattern.

Template Parameters
RowRangeSequencea sequence type with value type representing a row index range
ColRangeSequencea sequence type with value type representing a column index range
Parameters
rrsthe sequence of row ranges.
crsthe sequence of column ranges. The size has to be the same as that of rrs.
colsAreSortedtrue if all the column index ranges are sorted ascendingly

For each corresponding pair of row and column ranges in the sequences

  • rrs and
  • crs, the elements of the cartesian product of the row and column ranges are entered into the sparsity structure.

The insertion of indices is done in parallel on the NUMA chunks. In order to maintain efficiency, the number or size of the ranges provided in the sequences should not be too small (a couple of ten ranges should be ok).

Definition at line 1749 of file threadedMatrix.hh.

◆ balance()

template<class Index = size_t>
void Kaskade::NumaCRSPatternCreator< Index >::balance ( )

Redistributes the rows to the NUMA chunks in order to have the same number of entries in each chunk.

The association of matrix rows to chunks is recalculated in order to have approximately the same number of nonzeroes in every chunk. While this is not an extremely expensive operation, it does several reallocations. Call this once before creating a sparsity pattern.

◆ cols()

template<class Index = size_t>
Index Kaskade::NumaCRSPatternCreator< Index >::cols ( ) const
inline

◆ creator()

template<class Index = size_t>
ChunkCreator const & Kaskade::NumaCRSPatternCreator< Index >::creator ( int  node) const
inline

Returns the chunk creator.

Parameters
nodethe number of the NUMA node / chunk (0<=node<nodes()).

Definition at line 1824 of file threadedMatrix.hh.

Referenced by Kaskade::NumaCRSPattern< Index >::NumaCRSPattern().

◆ isSymmetric()

template<class Index = size_t>
bool Kaskade::NumaCRSPatternCreator< Index >::isSymmetric ( ) const
inline

Returns the symmetry status of the pattern.

If true, the matrix is symmetric, and only the lower triangular entries are actually stored.

Definition at line 1831 of file threadedMatrix.hh.

Referenced by Kaskade::insertMatrixBlockIndices().

◆ nodes()

template<class Index = size_t>
int Kaskade::NumaCRSPatternCreator< Index >::nodes ( ) const
inline

◆ nonzeroes()

template<class Index = size_t>
size_t Kaskade::NumaCRSPatternCreator< Index >::nonzeroes ( ) const
inline

The number of structurally nonzero elements.

For symmetric storage without superdiagonal elements stored, this counts subdiagonal elements twice.

Definition at line 1791 of file threadedMatrix.hh.


The documentation for this class was generated from the following file: