API

This section provides a detailed list of the library API

Host Utility Functions

template<typename DataType>
void rocalution::allocate_host(int size, DataType **ptr)

Allocate buffer on the host.

allocate_host allocates a buffer on the host.

Parameters
  • [in] size – number of elements the buffer need to be allocated for

  • [out] ptr – pointer to the position in memory where the buffer should be allocated, it is expected that *ptr == NULL

Template Parameters

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType>
void rocalution::free_host(DataType **ptr)

Free buffer on the host.

free_host deallocates a buffer on the host. *ptr will be set to NULL after successful deallocation.

Parameters

[inout] ptr – pointer to the position in memory where the buffer should be deallocated, it is expected that *ptr != NULL

Template Parameters

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType>
void rocalution::set_to_zero_host(int size, DataType *ptr)

Set a host buffer to zero.

set_to_zero_host sets a host buffer to zero.

Parameters
  • [in] size – number of elements

  • [inout] ptr – pointer to the host buffer

Template Parameters

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

double rocalution::rocalution_time(void)

Return current time in microseconds.

Backend Manager

int rocalution::init_rocalution(int rank = -1, int dev_per_node = 1)

Initialize rocALUTION platform.

init_rocalution defines a backend descriptor with information about the hardware and its specifications. All objects created after that contain a copy of this descriptor. If the specifications of the global descriptor are changed (e.g. set different number of threads) and new objects are created, only the new objects will use the new configurations.

For control, the library provides the following functions

Example

#include <rocalution.hpp>

using namespace rocalution;

int main(int argc, char* argv[])
{
    init_rocalution();

    // ...

    stop_rocalution();

    return 0;
}

Parameters
  • [in] rank – specifies MPI rank when multi-node environment

  • [in] dev_per_node – number of accelerator devices per node, when in multi-GPU environment

int rocalution::stop_rocalution(void)

Shutdown rocALUTION platform.

stop_rocalution shuts down the rocALUTION platform.

void rocalution::set_device_rocalution(int dev)

Set the accelerator device.

set_device_rocalution lets the user select the accelerator device that is supposed to be used for the computation.

Parameters

[in] dev – accelerator device ID for computation

void rocalution::set_omp_threads_rocalution(int nthreads)

Set number of OpenMP threads.

The number of threads which rocALUTION will use can be set with set_omp_threads_rocalution or by the global OpenMP environment variable (for Unix-like OS this is OMP_NUM_THREADS). During the initialization phase, the library provides affinity thread-core mapping:

  • If the number of cores (including SMT cores) is greater or equal than two times the number of threads, then all the threads can occupy every second core ID (e.g. 0, 2, 4, \(\ldots\)). This is to avoid having two threads working on the same physical core, when SMT is enabled.

  • If the number of threads is less or equal to the number of cores (including SMT), and the previous clause is false, then the threads can occupy every core ID (e.g. 0, 1, 2, 3, \(\ldots\)).

  • If non of the above criteria is matched, then the default thread-core mapping is used (typically set by the OS).

Note

The thread-core mapping is available only for Unix-like OS.

Note

The user can disable the thread affinity by calling set_omp_affinity_rocalution(), before initializing the library (i.e. before init_rocalution()).

Parameters

[in] nthreads – number of OpenMP threads

void rocalution::set_omp_affinity_rocalution(bool affinity)

Enable/disable OpenMP host affinity.

set_omp_affinity_rocalution enables / disables OpenMP host affinity.

Parameters

[in] affinity – boolean to turn on/off OpenMP host affinity

void rocalution::set_omp_threshold_rocalution(int threshold)

Set OpenMP threshold size.

Whenever you want to work on a small problem, you might observe that the OpenMP host backend is (slightly) slower than using no OpenMP. This is mainly attributed to the small amount of work, which every thread should perform and the large overhead of forking/joining threads. This can be avoid by the OpenMP threshold size parameter in rocALUTION. The default threshold is set to 10000, which means that all matrices under (and equal) this size will use only one thread (disregarding the number of OpenMP threads set in the system). The threshold can be modified with set_omp_threshold_rocalution.

Parameters

[in] threshold – OpenMP threshold size

void rocalution::info_rocalution(void)

Print info about rocALUTION.

info_rocalution prints information about the rocALUTION platform

void rocalution::info_rocalution(const struct Rocalution_Backend_Descriptor backend_descriptor)

Print info about specific rocALUTION backend descriptor.

info_rocalution prints information about the rocALUTION platform of the specific backend descriptor.

Parameters

[in] backend_descriptor – rocALUTION backend descriptor

void rocalution::disable_accelerator_rocalution(bool onoff = true)

Disable/Enable the accelerator.

If you want to disable the accelerator (without re-compiling the code), you need to call disable_accelerator_rocalution before init_rocalution().

Parameters

[in] onoff – boolean to turn on/off the accelerator

void rocalution::_rocalution_sync(void)

Sync rocALUTION.

_rocalution_sync blocks the host until all active asynchronous transfers are completed.

Base Rocalution

template<typename ValueType>
class rocalution::BaseRocalution : public rocalution::RocalutionObj

Base class for all operators and vectors.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::Operator< ValueType >, rocalution::Vector< ValueType >

Public Functions

virtual void MoveToAccelerator(void) = 0

Move the object to the accelerator backend.

virtual void MoveToHost(void) = 0

Move the object to the host backend.

virtual void MoveToAcceleratorAsync(void)

Move the object to the accelerator backend with async move.

virtual void MoveToHostAsync(void)

Move the object to the host backend with async move.

virtual void Sync(void)

Sync (the async move)

virtual void CloneBackend(const BaseRocalution<ValueType> &src)

Clone the Backend descriptor from another object.

With CloneBackend, the backend can be cloned without copying any data. This is especially useful, if several objects should reside on the same backend, but keep their original data.

Example

LocalVector<ValueType> vec;
LocalMatrix<ValueType> mat;

// Allocate and initialize vec and mat
// ...

LocalVector<ValueType> tmp;
// By cloning backend, tmp and vec will have the same backend as mat
tmp.CloneBackend(mat);
vec.CloneBackend(mat);

// The following matrix vector multiplication will be performed on the backend
// selected in mat
mat.Apply(vec, &tmp);

Parameters

[in] src – Object, where the backend should be cloned from.

virtual void Info(void) const = 0

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

virtual void Clear(void) = 0

Clear (free all data) the object.

Operator

template<typename ValueType>
class rocalution::Operator : public rocalution::BaseRocalution<ValueType>

Operator class.

The Operator class defines the generic interface for applying an operator (e.g. matrix or stencil) from/to global and local vectors.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::GlobalMatrix< ValueType >, rocalution::LocalMatrix< ValueType >, rocalution::LocalStencil< ValueType >

Public Functions

virtual IndexType2 GetM(void) const = 0

Return the number of rows in the matrix/stencil.

virtual IndexType2 GetN(void) const = 0

Return the number of columns in the matrix/stencil.

virtual IndexType2 GetNnz(void) const = 0

Return the number of non-zeros in the matrix/stencil.

virtual int GetLocalM(void) const

Return the number of rows in the local matrix/stencil.

virtual int GetLocalN(void) const

Return the number of columns in the local matrix/stencil.

virtual int GetLocalNnz(void) const

Return the number of non-zeros in the local matrix/stencil.

virtual int GetGhostM(void) const

Return the number of rows in the ghost matrix/stencil.

virtual int GetGhostN(void) const

Return the number of columns in the ghost matrix/stencil.

virtual int GetGhostNnz(void) const

Return the number of non-zeros in the ghost matrix/stencil.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Apply the operator, out = Operator(in), where in and out are local vectors.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const

Apply and add the operator, out += scalar * Operator(in), where in and out are local vectors.

virtual void Apply(const GlobalVector<ValueType> &in, GlobalVector<ValueType> *out) const

Apply the operator, out = Operator(in), where in and out are global vectors.

virtual void ApplyAdd(const GlobalVector<ValueType> &in, ValueType scalar, GlobalVector<ValueType> *out) const

Apply and add the operator, out += scalar * Operator(in), where in and out are global vectors.

Vector

template<typename ValueType>
class rocalution::Vector : public rocalution::BaseRocalution<ValueType>

Vector class.

The Vector class defines the generic interface for local and global vectors.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::LocalVector< int >, rocalution::GlobalVector< ValueType >, rocalution::LocalVector< ValueType >

Unnamed Group

virtual void CopyFrom(const LocalVector<ValueType> &src)

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

[in] srcVector, where values should be copied from.

virtual void CopyFrom(const GlobalVector<ValueType> &src)

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

[in] srcVector, where values should be copied from.

Unnamed Group

virtual void CloneFrom(const LocalVector<ValueType> &src)

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters

[in] srcVector to clone from.

virtual void CloneFrom(const GlobalVector<ValueType> &src)

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters

[in] srcVector to clone from.

Public Functions

virtual IndexType2 GetSize(void) const = 0

Return the size of the vector.

virtual int GetLocalSize(void) const

Return the size of the local vector.

virtual int GetGhostSize(void) const

Return the size of the ghost vector.

virtual bool Check(void) const = 0

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Returns truetrue

if the vector is ok (empty vector is also ok).

Returns falsefalse

if there is something wrong with the values.

virtual void Clear(void) = 0

Clear (free all data) the object.

virtual void Zeros(void) = 0

Set all values of the vector to 0.

virtual void Ones(void) = 0

Set all values of the vector to 1.

virtual void SetValues(ValueType val) = 0

Set all values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1)) = 0

Fill the vector with random values from interval [a,b].

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1)) = 0

Fill the vector with random values from normal distribution.

virtual void ReadFileASCII(const std::string filename) = 0

Read vector from ASCII file.

Read a vector from ASCII file.

Example

LocalVector<ValueType> vec;
vec.ReadFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file containing the ASCII data.

virtual void WriteFileASCII(const std::string filename) const = 0

Write vector to ASCII file.

Write a vector to ASCII file.

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file to write the ASCII data to.

virtual void ReadFileBinary(const std::string filename) = 0

Read vector from binary file.

Read a vector from binary file. For details on the format, see WriteFileBinary().

Example

LocalVector<ValueType> vec;
vec.ReadFileBinary("my_vector.bin");

Parameters

[in] filename – name of the file containing the data.

virtual void WriteFileBinary(const std::string filename) const = 0

Write vector to binary file.

Write a vector to binary file.

The binary format contains a header, the rocALUTION version and the vector data as follows

// Header
out << "#rocALUTION binary vector file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// Vector data
out.write((char*)&size, sizeof(int));
out.write((char*)vec_val, size * sizeof(double));

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileBinary("my_vector.bin");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters

[in] filename – name of the file to write the data to.

virtual void CopyFromAsync(const LocalVector<ValueType> &src)

Async copy from another local vector.

virtual void CopyFromFloat(const LocalVector<float> &src)

Copy values from another local float vector.

virtual void CopyFromDouble(const LocalVector<double> &src)

Copy values from another local double vector.

virtual void CopyFrom(const LocalVector<ValueType> &src, int src_offset, int dst_offset, int size)

Copy vector from another vector with offsets and size.

CopyFrom copies values with specific source and destination offsets and sizes from another vector.

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters
  • [in] srcVector, where values should be copied from.

  • [in] src_offset – source offset.

  • [in] dst_offset – destination offset.

  • [in] size – number of entries to be copied.

virtual void AddScale(const LocalVector<ValueType> &x, ValueType alpha)

Perform vector update of type this = this + alpha * x.

virtual void AddScale(const GlobalVector<ValueType> &x, ValueType alpha)

Perform vector update of type this = this + alpha * x.

virtual void ScaleAdd(ValueType alpha, const LocalVector<ValueType> &x)

Perform vector update of type this = alpha * this + x.

virtual void ScaleAdd(ValueType alpha, const GlobalVector<ValueType> &x)

Perform vector update of type this = alpha * this + x.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta)

Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta)

Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, int src_offset, int dst_offset, int size)

Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, int src_offset, int dst_offset, int size)

Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAdd2(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, const LocalVector<ValueType> &y, ValueType gamma)

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void ScaleAdd2(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, const GlobalVector<ValueType> &y, ValueType gamma)

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void Scale(ValueType alpha) = 0

Perform vector scaling this = alpha * this.

virtual ValueType Dot(const LocalVector<ValueType> &x) const

Compute dot (scalar) product, return this^T y.

virtual ValueType Dot(const GlobalVector<ValueType> &x) const

Compute dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const LocalVector<ValueType> &x) const

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const GlobalVector<ValueType> &x) const

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType Norm(void) const = 0

Compute \(L_2\) norm of the vector, return = srqt(this^T this)

virtual ValueType Reduce(void) const = 0

Reduce the vector.

virtual ValueType Asum(void) const = 0

Compute the sum of absolute values of the vector, return = sum(|this|)

virtual int Amax(ValueType &value) const = 0

Compute the absolute max of the vector, return = index(max(|this|))

virtual void PointWiseMult(const LocalVector<ValueType> &x)

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const GlobalVector<ValueType> &x)

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const LocalVector<ValueType> &x, const LocalVector<ValueType> &y)

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void PointWiseMult(const GlobalVector<ValueType> &x, const GlobalVector<ValueType> &y)

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void Power(double power) = 0

Perform power operation to a vector.

Local Matrix

template<typename ValueType>
class rocalution::LocalMatrix : public rocalution::Operator<ValueType>

LocalMatrix class.

A LocalMatrix is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Unnamed Group

void AllocateCSR(const std::string name, int nnz, int nrow, int ncol)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateBCSR(const std::string name, int nnzb, int nrowb, int ncolb, int blockdim)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateMCSR(const std::string name, int nnz, int nrow, int ncol)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateCOO(const std::string name, int nnz, int nrow, int ncol)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateDIA(const std::string name, int nnz, int nrow, int ncol, int ndiag)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateELL(const std::string name, int nnz, int nrow, int ncol, int max_row)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateHYB(const std::string name, int ell_nnz, int coo_nnz, int ell_max_row, int nrow, int ncol)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateDENSE(const std::string name, int nrow, int ncol)

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example

LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

Unnamed Group

void SetDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int nnz, int nrow, int ncol)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrCSR(int **row_offset, int **col, ValueType **val, std::string name, int nnz, int nrow, int ncol)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrBCSR(int **row_offset, int **col, ValueType **val, std::string name, int nnzb, int nrowb, int ncolb, int blockdim)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrMCSR(int **row_offset, int **col, ValueType **val, std::string name, int nnz, int nrow, int ncol)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrELL(int **col, ValueType **val, std::string name, int nnz, int nrow, int ncol, int max_row)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrDIA(int **offset, ValueType **val, std::string name, int nnz, int nrow, int ncol, int num_diag)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrDENSE(ValueType **val, std::string name, int nrow, int ncol)

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example

// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

Unnamed Group

void LeaveDataPtrCOO(int **row, int **col, ValueType **val)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrCSR(int **row_offset, int **col, ValueType **val)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrBCSR(int **row_offset, int **col, ValueType **val, int &blockdim)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrMCSR(int **row_offset, int **col, ValueType **val)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrELL(int **col, ValueType **val, int &max_row)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrDIA(int **offset, ValueType **val, int &num_diag)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrDENSE(ValueType **val)

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example

// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

Public Functions

virtual void Info(void) const

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

unsigned int GetFormat(void) const

Return the matrix format id (see matrix_formats.hpp)

virtual IndexType2 GetM(void) const

Return the number of rows in the matrix/stencil.

virtual IndexType2 GetN(void) const

Return the number of columns in the matrix/stencil.

virtual IndexType2 GetNnz(void) const

Return the number of non-zeros in the matrix/stencil.

bool Check(void) const

Perform a sanity check of the matrix.

Checks, if the matrix contains valid data, i.e. if the values are not infinity and not NaN (not a number) and if the structure of the matrix is correct (e.g. indices cannot be negative, CSR and COO matrices have to be sorted, etc.).

Returns truetrue

if the matrix is ok (empty matrix is also ok).

Returns falsefalse

if there is something wrong with the structure or values.

virtual void Clear(void)

Clear (free all data) the object.

void Zeros(void)

Set all matrix values to zero.

void Scale(ValueType alpha)

Scale all values in the matrix.

void ScaleDiagonal(ValueType alpha)

Scale the diagonal entries of the matrix with alpha, all diagonal elements must exist.

void ScaleOffDiagonal(ValueType alpha)

Scale the off-diagonal entries of the matrix with alpha, all diagonal elements must exist.

void AddScalar(ValueType alpha)

Add a scalar to all matrix values.

void AddScalarDiagonal(ValueType alpha)

Add alpha to the diagonal entries of the matrix, all diagonal elements must exist.

void AddScalarOffDiagonal(ValueType alpha)

Add alpha to the off-diagonal entries of the matrix, all diagonal elements must exist.

void ExtractSubMatrix(int row_offset, int col_offset, int row_size, int col_size, LocalMatrix<ValueType> *mat) const

Extract a sub-matrix with row/col_offset and row/col_size.

void ExtractSubMatrices(int row_num_blocks, int col_num_blocks, const int *row_offset, const int *col_offset, LocalMatrix<ValueType> ***mat) const

Extract array of non-overlapping sub-matrices (row/col_num_blocks define the blocks for rows/columns; row/col_offset have sizes col/row_num_blocks+1, where [i+1]-[i] defines the i-th size of the sub-matrix)

void ExtractDiagonal(LocalVector<ValueType> *vec_diag) const

Extract the diagonal values of the matrix into a LocalVector.

void ExtractInverseDiagonal(LocalVector<ValueType> *vec_inv_diag) const

Extract the inverse (reciprocal) diagonal values of the matrix into a LocalVector.

void ExtractU(LocalMatrix<ValueType> *U, bool diag) const

Extract the upper triangular matrix.

void ExtractL(LocalMatrix<ValueType> *L, bool diag) const

Extract the lower triangular matrix.

void Permute(const LocalVector<int> &permutation)

Perform (forward) permutation of the matrix.

void PermuteBackward(const LocalVector<int> &permutation)

Perform (backward) permutation of the matrix.

void CMK(LocalVector<int> *permutation) const

Create permutation vector for CMK reordering of the matrix.

The Cuthill-McKee ordering minimize the bandwidth of a given sparse matrix.

Example

LocalVector<int> cmk;

mat.CMK(&cmk);
mat.Permute(cmk);

Parameters

[out] permutation – permutation vector for CMK reordering

void RCMK(LocalVector<int> *permutation) const

Create permutation vector for reverse CMK reordering of the matrix.

The Reverse Cuthill-McKee ordering minimize the bandwidth of a given sparse matrix.

Example

LocalVector<int> rcmk;

mat.RCMK(&rcmk);
mat.Permute(rcmk);

Parameters

[out] permutation – permutation vector for reverse CMK reordering

void ConnectivityOrder(LocalVector<int> *permutation) const

Create permutation vector for connectivity reordering of the matrix.

Connectivity ordering returns a permutation, that sorts the matrix by non-zero entries per row.

Example

LocalVector<int> conn;

mat.ConnectivityOrder(&conn);
mat.Permute(conn);

Parameters

[out] permutation – permutation vector for connectivity reordering

void MultiColoring(int &num_colors, int **size_colors, LocalVector<int> *permutation) const

Perform multi-coloring decomposition of the matrix.

The Multi-Coloring algorithm builds a permutation (coloring of the matrix) in a way such that no two adjacent nodes in the sparse matrix have the same color.

Example

LocalVector<int> mc;
int num_colors;
int* block_colors = NULL;

mat.MultiColoring(num_colors, &block_colors, &mc);
mat.Permute(mc);

Parameters
  • [out] num_colors – number of colors

  • [out] size_colors – pointer to array that holds the number of nodes for each color

  • [out] permutation – permutation vector for multi-coloring reordering

void MaximalIndependentSet(int &size, LocalVector<int> *permutation) const

Perform maximal independent set decomposition of the matrix.

The Maximal Independent Set algorithm finds a set with maximal size, that contains elements that do not depend on other elements in this set.

Example

LocalVector<int> mis;
int size;

mat.MaximalIndependentSet(size, &mis);
mat.Permute(mis);

Parameters
  • [out] size – number of independent sets

  • [out] permutation – permutation vector for maximal independent set reordering

void ZeroBlockPermutation(int &size, LocalVector<int> *permutation) const

Return a permutation for saddle-point problems (zero diagonal entries)

For Saddle-Point problems, (i.e. matrices with zero diagonal entries), the Zero Block Permutation maps all zero-diagonal elements to the last block of the matrix.

Example

LocalVector<int> zbp;
int size;

mat.ZeroBlockPermutation(size, &zbp);
mat.Permute(zbp);

Parameters
  • [out] size

  • [out] permutation – permutation vector for zero block permutation

void ILU0Factorize(void)

Perform ILU(0) factorization.

void LUFactorize(void)

Perform LU factorization.

void ILUTFactorize(double t, int maxrow)

Perform ILU(t,m) factorization based on threshold and maximum number of elements per row.

void ILUpFactorize(int p, bool level = true)

Perform ILU(p) factorization based on power.

void LUAnalyse(void)

Analyse the structure (level-scheduling)

void LUAnalyseClear(void)

Delete the analysed data (see LUAnalyse)

void LUSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Solve LU out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void ICFactorize(LocalVector<ValueType> *inv_diag)

Perform IC(0) factorization.

void LLAnalyse(void)

Analyse the structure (level-scheduling)

void LLAnalyseClear(void)

Delete the analysed data (see LLAnalyse)

void LLSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Solve LL^T out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void LLSolve(const LocalVector<ValueType> &in, const LocalVector<ValueType> &inv_diag, LocalVector<ValueType> *out) const

Solve LL^T out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void LAnalyse(bool diag_unit = false)

Analyse the structure (level-scheduling) L-part.

  • diag_unit == true the diag is 1;

  • diag_unit == false the diag is 0;

void LAnalyseClear(void)

Delete the analysed data (see LAnalyse) L-part.

void LSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Solve L out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void UAnalyse(bool diag_unit = false)

Analyse the structure (level-scheduling) U-part;.

  • diag_unit == true the diag is 1;

  • diag_unit == false the diag is 0;

void UAnalyseClear(void)

Delete the analysed data (see UAnalyse) U-part.

void USolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Solve U out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void Householder(int idx, ValueType &beta, LocalVector<ValueType> *vec) const

Compute Householder vector.

void QRDecompose(void)

QR Decomposition.

void QRSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Solve QR out = in.

void Invert(void)

Matrix inversion using QR decomposition.

void ReadFileMTX(const std::string filename)

Read matrix from MTX (Matrix Market Format) file.

Read a matrix from Matrix Market Format file.

Example

LocalMatrix<ValueType> mat;
mat.ReadFileMTX("my_matrix.mtx");

Parameters

[in] filename – name of the file containing the MTX data.

void WriteFileMTX(const std::string filename) const

Write matrix to MTX (Matrix Market Format) file.

Write a matrix to Matrix Market Format file.

Example

LocalMatrix<ValueType> mat;

// Allocate and fill mat
// ...

mat.WriteFileMTX("my_matrix.mtx");

Parameters

[in] filename – name of the file to write the MTX data to.

void ReadFileCSR(const std::string filename)

Read matrix from CSR (rocALUTION binary format) file.

Read a CSR matrix from binary file. For details on the format, see WriteFileCSR().

Example

LocalMatrix<ValueType> mat;
mat.ReadFileCSR("my_matrix.csr");

Parameters

[in] filename – name of the file containing the data.

void WriteFileCSR(const std::string filename) const

Write CSR matrix to binary file.

Write a CSR matrix to binary file.

The binary format contains a header, the rocALUTION version and the matrix data as follows

// Header
out << "#rocALUTION binary csr file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// CSR matrix data
out.write((char*)&m, sizeof(int));
out.write((char*)&n, sizeof(int));
out.write((char*)&nnz, sizeof(int));
out.write((char*)csr_row_ptr, (m + 1) * sizeof(int));
out.write((char*)csr_col_ind, nnz * sizeof(int));
out.write((char*)csr_val, nnz * sizeof(double));

Example

LocalMatrix<ValueType> mat;

// Allocate and fill mat
// ...

mat.WriteFileCSR("my_matrix.csr");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters

[in] filename – name of the file to write the data to.

virtual void MoveToAccelerator(void)

Move the object to the accelerator backend.

virtual void MoveToAcceleratorAsync(void)

Move the object to the accelerator backend with async move.

virtual void MoveToHost(void)

Move the object to the host backend.

virtual void MoveToHostAsync(void)

Move the object to the host backend with async move.

virtual void Sync(void)

Sync (the async move)

void CopyFrom(const LocalMatrix<ValueType> &src)

Copy matrix from another LocalMatrix.

CopyFrom copies values and structure from another local matrix. Source and destination matrix should be in the same format.

Example

LocalMatrix<ValueType> mat1, mat2;

// Allocate and initialize mat1 and mat2
// ...

// Move mat1 to accelerator
// mat1.MoveToAccelerator();

// Now, mat1 is on the accelerator (if available)
// and mat2 is on the host

// Copy mat1 to mat2 (or vice versa) will move data between host and
// accelerator backend
mat1.CopyFrom(mat2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

[in] src – Local matrix where values and structure should be copied from.

void CopyFromAsync(const LocalMatrix<ValueType> &src)

Async copy matrix (values and structure) from another LocalMatrix.

void CloneFrom(const LocalMatrix<ValueType> &src)

Clone the matrix.

CloneFrom clones the entire matrix, including values, structure and backend descriptor from another LocalMatrix.

Example

LocalMatrix<ValueType> mat;

// Allocate and initialize mat (host or accelerator)
// ...

LocalMatrix<ValueType> tmp;

// By cloning mat, tmp will have identical values and structure and will be on
// the same backend as mat
tmp.CloneFrom(mat);

Parameters

[in] srcLocalMatrix to clone from.

void UpdateValuesCSR(ValueType *val)

Update CSR matrix entries only, structure will remain the same.

void CopyFromCSR(const int *row_offsets, const int *col, const ValueType *val)

Copy (import) CSR matrix described in three arrays (offsets, columns, values). The object data has to be allocated (call AllocateCSR first)

void CopyToCSR(int *row_offsets, int *col, ValueType *val) const

Copy (export) CSR matrix described in three arrays (offsets, columns, values). The output arrays have to be allocated.

void CopyFromCOO(const int *row, const int *col, const ValueType *val)

Copy (import) COO matrix described in three arrays (rows, columns, values). The object data has to be allocated (call AllocateCOO first)

void CopyToCOO(int *row, int *col, ValueType *val) const

Copy (export) COO matrix described in three arrays (rows, columns, values). The output arrays have to be allocated.

void CopyFromHostCSR(const int *row_offset, const int *col, const ValueType *val, const std::string name, int nnz, int nrow, int ncol)

Allocates and copies (imports) a host CSR matrix.

If the CSR matrix data pointers are only accessible as constant, the user can create a LocalMatrix object and pass const CSR host pointers. The LocalMatrix will then be allocated and the data will be copied to the corresponding backend, where the original object was located at.

Parameters
  • [in] row_offset – CSR matrix row offset pointers.

  • [in] col – CSR matrix column indices.

  • [in] val – CSR matrix values array.

  • [in] name – Matrix object name.

  • [in] nnz – Number of non-zero elements.

  • [in] nrow – Number of rows.

  • [in] ncol – Number of columns.

void CreateFromMap(const LocalVector<int> &map, int n, int m)

Create a restriction matrix operator based on an int vector map.

void CreateFromMap(const LocalVector<int> &map, int n, int m, LocalMatrix<ValueType> *pro)

Create a restriction and prolongation matrix operator based on an int vector map.

void ConvertToCSR(void)

Convert the matrix to CSR structure.

void ConvertToMCSR(void)

Convert the matrix to MCSR structure.

void ConvertToBCSR(int blockdim)

Convert the matrix to BCSR structure.

void ConvertToCOO(void)

Convert the matrix to COO structure.

void ConvertToELL(void)

Convert the matrix to ELL structure.

void ConvertToDIA(void)

Convert the matrix to DIA structure.

void ConvertToHYB(void)

Convert the matrix to HYB structure.

void ConvertToDENSE(void)

Convert the matrix to DENSE structure.

void ConvertTo(unsigned int matrix_format, int blockdim = 1)

Convert the matrix to specified matrix ID format.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Apply the operator, out = Operator(in), where in and out are local vectors.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const

Apply and add the operator, out += scalar * Operator(in), where in and out are local vectors.

void SymbolicPower(int p)

Perform symbolic computation (structure only) of \(|this|^p\).

void MatrixAdd(const LocalMatrix<ValueType> &mat, ValueType alpha = static_cast<ValueType>(1), ValueType beta = static_cast<ValueType>(1), bool structure = false)

Perform matrix addition, this = alpha*this + beta*mat;.

  • if structure==false the sparsity pattern of the matrix is not changed;

  • if structure==true a new sparsity pattern is computed

void MatrixMult(const LocalMatrix<ValueType> &A, const LocalMatrix<ValueType> &B)

Multiply two matrices, this = A * B.

void DiagonalMatrixMult(const LocalVector<ValueType> &diag)

Multiply the matrix with diagonal matrix (stored in LocalVector), as DiagonalMatrixMultR()

void DiagonalMatrixMultL(const LocalVector<ValueType> &diag)

Multiply the matrix with diagonal matrix (stored in LocalVector), this=diag*this.

void DiagonalMatrixMultR(const LocalVector<ValueType> &diag)

Multiply the matrix with diagonal matrix (stored in LocalVector), this=this*diag.

void Gershgorin(ValueType &lambda_min, ValueType &lambda_max) const

Compute the spectrum approximation with Gershgorin circles theorem.

void Compress(double drop_off)

Delete all entries in the matrix which abs(a_ij) <= drop_off; the diagonal elements are never deleted.

void Transpose(void)

Transpose the matrix.

void Sort(void)

Sort the matrix indices.

Sorts the matrix by indices.

  • For CSR matrices, column values are sorted.

  • For COO matrices, row indices are sorted.

void Key(long int &row_key, long int &col_key, long int &val_key) const

Compute a unique hash key for the matrix arrays.

Typically, it is hard to compare if two matrices have the same structure (and values). To do so, rocALUTION provides a keying function, that generates three keys, for the row index, column index and values array.

Parameters
  • [out] row_key – row index array key

  • [out] col_key – column index array key

  • [out] val_key – values array key

void ReplaceColumnVector(int idx, const LocalVector<ValueType> &vec)

Replace a column vector of a matrix.

void ReplaceRowVector(int idx, const LocalVector<ValueType> &vec)

Replace a row vector of a matrix.

void ExtractColumnVector(int idx, LocalVector<ValueType> *vec) const

Extract values from a column of a matrix to a vector.

void ExtractRowVector(int idx, LocalVector<ValueType> *vec) const

Extract values from a row of a matrix to a vector.

void AMGConnect(ValueType eps, LocalVector<int> *connections) const

Strong couplings for aggregation-based AMG.

void AMGAggregate(const LocalVector<int> &connections, LocalVector<int> *aggregates) const

Plain aggregation - Modification of a greedy aggregation scheme from Vanek (1996)

void AMGSmoothedAggregation(ValueType relax, const LocalVector<int> &aggregates, const LocalVector<int> &connections, LocalMatrix<ValueType> *prolong, LocalMatrix<ValueType> *restrict) const

Interpolation scheme based on smoothed aggregation from Vanek (1996)

void AMGAggregation(const LocalVector<int> &aggregates, LocalMatrix<ValueType> *prolong, LocalMatrix<ValueType> *restrict) const

Aggregation-based interpolation scheme.

void RugeStueben(ValueType eps, LocalMatrix<ValueType> *prolong, LocalMatrix<ValueType> *restrict) const

Ruge Stueben coarsening.

void FSAI(int power, const LocalMatrix<ValueType> *pattern)

Factorized Sparse Approximate Inverse assembly for given system matrix power pattern or external sparsity pattern.

void SPAI(void)

SParse Approximate Inverse assembly for given system matrix pattern.

void InitialPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Initial Pairwise Aggregation scheme.

void InitialPairwiseAggregation(const LocalMatrix<ValueType> &mat, ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Initial Pairwise Aggregation scheme for split matrices.

void FurtherPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Further Pairwise Aggregation scheme.

void FurtherPairwiseAggregation(const LocalMatrix<ValueType> &mat, ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Further Pairwise Aggregation scheme for split matrices.

void CoarsenOperator(LocalMatrix<ValueType> *Ac, int nrow, int ncol, const LocalVector<int> &G, int Gsize, const int *rG, int rGsize) const

Build coarse operator for pairwise aggregation scheme.

Local Stencil

template<typename ValueType>
class rocalution::LocalStencil : public rocalution::Operator<ValueType>

LocalStencil class.

A LocalStencil is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

LocalStencil(unsigned int type)

Initialize a local stencil with a type.

virtual void Info() const

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

int GetNDim(void) const

Return the dimension of the stencil.

virtual IndexType2 GetM(void) const

Return the number of rows in the matrix/stencil.

virtual IndexType2 GetN(void) const

Return the number of columns in the matrix/stencil.

virtual IndexType2 GetNnz(void) const

Return the number of non-zeros in the matrix/stencil.

void SetGrid(int size)

Set the stencil grid size.

virtual void Clear()

Clear (free all data) the object.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const

Apply the operator, out = Operator(in), where in and out are local vectors.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const

Apply and add the operator, out += scalar * Operator(in), where in and out are local vectors.

virtual void MoveToAccelerator(void)

Move the object to the accelerator backend.

virtual void MoveToHost(void)

Move the object to the host backend.

Global Matrix

template<typename ValueType>
class rocalution::GlobalMatrix : public rocalution::Operator<ValueType>

GlobalMatrix class.

A GlobalMatrix is called global, because it can stay on a single or on multiple nodes in a network. For this type of communication, MPI is used.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

GlobalMatrix(const ParallelManager &pm)

Initialize a global matrix with a parallel manager.

virtual IndexType2 GetM(void) const

Return the number of rows in the matrix/stencil.

virtual IndexType2 GetN(void) const

Return the number of columns in the matrix/stencil.

virtual IndexType2 GetNnz(void) const

Return the number of non-zeros in the matrix/stencil.

virtual int GetLocalM(void) const

Return the number of rows in the local matrix/stencil.

virtual int GetLocalN(void) const

Return the number of columns in the local matrix/stencil.

virtual int GetLocalNnz(void) const

Return the number of non-zeros in the local matrix/stencil.

virtual int GetGhostM(void) const

Return the number of rows in the ghost matrix/stencil.

virtual int GetGhostN(void) const

Return the number of columns in the ghost matrix/stencil.

virtual int GetGhostNnz(void) const

Return the number of non-zeros in the ghost matrix/stencil.

virtual void MoveToAccelerator(void)

Move the object to the accelerator backend.

virtual void MoveToHost(void)

Move the object to the host backend.

virtual void Info(void) const

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

virtual bool Check(void) const

Return true if the matrix is ok (empty matrix is also ok) and false if there is something wrong with the strcture or some of values are NaN.

void AllocateCSR(std::string name, int local_nnz, int ghost_nnz)

Allocate CSR Matrix.

void AllocateCOO(std::string name, int local_nnz, int ghost_nnz)

Allocate COO Matrix.

virtual void Clear(void)

Clear (free all data) the object.

void SetParallelManager(const ParallelManager &pm)

Set the parallel manager of a global vector.

void SetDataPtrCSR(int **local_row_offset, int **local_col, ValueType **local_val, int **ghost_row_offset, int **ghost_col, ValueType **ghost_val, std::string name, int local_nnz, int ghost_nnz)

Initialize a CSR matrix on the host with externally allocated data.

void SetDataPtrCOO(int **local_row, int **local_col, ValueType **local_val, int **ghost_row, int **ghost_col, ValueType **ghost_val, std::string name, int local_nnz, int ghost_nnz)

Initialize a COO matrix on the host with externally allocated data.

void SetLocalDataPtrCSR(int **row_offset, int **col, ValueType **val, std::string name, int nnz)

Initialize a CSR matrix on the host with externally allocated local data.

void SetLocalDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int nnz)

Initialize a COO matrix on the host with externally allocated local data.

void SetGhostDataPtrCSR(int **row_offset, int **col, ValueType **val, std::string name, int nnz)

Initialize a CSR matrix on the host with externally allocated ghost data.

void SetGhostDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int nnz)

Initialize a COO matrix on the host with externally allocated ghost data.

void LeaveDataPtrCSR(int **local_row_offset, int **local_col, ValueType **local_val, int **ghost_row_offset, int **ghost_col, ValueType **ghost_val)

Leave a CSR matrix to host pointers.

void LeaveDataPtrCOO(int **local_row, int **local_col, ValueType **local_val, int **ghost_row, int **ghost_col, ValueType **ghost_val)

Leave a COO matrix to host pointers.

void LeaveLocalDataPtrCSR(int **row_offset, int **col, ValueType **val)

Leave a local CSR matrix to host pointers.

void LeaveLocalDataPtrCOO(int **row, int **col, ValueType **val)

Leave a local COO matrix to host pointers.

void LeaveGhostDataPtrCSR(int **row_offset, int **col, ValueType **val)

Leave a CSR ghost matrix to host pointers.

void LeaveGhostDataPtrCOO(int **row, int **col, ValueType **val)

Leave a COO ghost matrix to host pointers.

void CloneFrom(const GlobalMatrix<ValueType> &src)

Clone the entire matrix (values,structure+backend descr) from another GlobalMatrix.

void CopyFrom(const GlobalMatrix<ValueType> &src)

Copy matrix (values and structure) from another GlobalMatrix.

void ConvertToCSR(void)

Convert the matrix to CSR structure.

void ConvertToMCSR(void)

Convert the matrix to MCSR structure.

void ConvertToBCSR(int blockdim)

Convert the matrix to BCSR structure.

void ConvertToCOO(void)

Convert the matrix to COO structure.

void ConvertToELL(void)

Convert the matrix to ELL structure.

void ConvertToDIA(void)

Convert the matrix to DIA structure.

void ConvertToHYB(void)

Convert the matrix to HYB structure.

void ConvertToDENSE(void)

Convert the matrix to DENSE structure.

void ConvertTo(unsigned int matrix_format, int blockdim = 1)

Convert the matrix to specified matrix ID format.

virtual void Apply(const GlobalVector<ValueType> &in, GlobalVector<ValueType> *out) const

Apply the operator, out = Operator(in), where in and out are global vectors.

virtual void ApplyAdd(const GlobalVector<ValueType> &in, ValueType scalar, GlobalVector<ValueType> *out) const

Apply and add the operator, out += scalar * Operator(in), where in and out are global vectors.

void ReadFileMTX(const std::string filename)

Read matrix from MTX (Matrix Market Format) file.

void WriteFileMTX(const std::string filename) const

Write matrix to MTX (Matrix Market Format) file.

void ReadFileCSR(const std::string filename)

Read matrix from CSR (ROCALUTION binary format) file.

void WriteFileCSR(const std::string filename) const

Write matrix to CSR (ROCALUTION binary format) file.

void Sort(void)

Sort the matrix indices.

void ExtractInverseDiagonal(GlobalVector<ValueType> *vec_inv_diag) const

Extract the inverse (reciprocal) diagonal values of the matrix into a GlobalVector.

void Scale(ValueType alpha)

Scale all the values in the matrix.

void InitialPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Initial Pairwise Aggregation scheme.

void FurtherPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const

Further Pairwise Aggregation scheme.

void CoarsenOperator(GlobalMatrix<ValueType> *Ac, ParallelManager *pm, int nrow, int ncol, const LocalVector<int> &G, int Gsize, const int *rG, int rGsize) const

Build coarse operator for pairwise aggregation scheme.

Local Vector

template<typename ValueType>
class rocalution::LocalVector : public rocalution::Vector<ValueType>

LocalVector class.

A LocalVector is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Unnamed Group

ValueType &operator[](int i)

Access operator (only for host data)

The elements in the vector can be accessed via [] operators, when the vector is allocated on the host.

Example

// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate vector
vec.Allocate("my_vector", 100);

// Initialize vector with 1
vec.Ones();

// Set even elements to -1
for(int i = 0; i < vec.GetSize(); i += 2)
{
  vec[i] = -1;
}

Parameters

[in] i – access data at index i

Returns

value at index i

const ValueType &operator[](int i) const

Access operator (only for host data)

The elements in the vector can be accessed via [] operators, when the vector is allocated on the host.

Example

// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate vector
vec.Allocate("my_vector", 100);

// Initialize vector with 1
vec.Ones();

// Set even elements to -1
for(int i = 0; i < vec.GetSize(); i += 2)
{
  vec[i] = -1;
}

Parameters

[in] i – access data at index i

Returns

value at index i

Public Functions

virtual void MoveToAccelerator(void)

Move the object to the accelerator backend.

virtual void MoveToAcceleratorAsync(void)

Move the object to the accelerator backend with async move.

virtual void MoveToHost(void)

Move the object to the host backend.

virtual void MoveToHostAsync(void)

Move the object to the host backend with async move.

virtual void Sync(void)

Sync (the async move)

virtual void Info(void) const

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

virtual IndexType2 GetSize(void) const

Return the size of the vector.

virtual bool Check(void) const

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Returns truetrue

if the vector is ok (empty vector is also ok).

Returns falsefalse

if there is something wrong with the values.

void Allocate(std::string name, IndexType2 size)

Allocate a local vector with name and size.

The local vector allocation function requires a name of the object (this is only for information purposes) and corresponding size description for vector objects.

Example

LocalVector<ValueType> vec;

vec.Allocate("my vector", 100);
vec.Clear();

Parameters
  • [in] name – object name

  • [in] size – number of elements in the vector

void SetDataPtr(ValueType **ptr, std::string name, int size)

Initialize a LocalVector on the host with externally allocated data.

SetDataPtr has direct access to the raw data via pointers. Already allocated data can be set by passing the pointer.

Example

// Allocate vector
ValueType* ptr_vec = new ValueType[200];

// Fill vector
// ...

// rocALUTION local vector object
LocalVector<ValueType> vec;

// Set the vector data, ptr_vec will become invalid
vec.SetDataPtr(&ptr_vec, "my_vector", 200);

Note

Setting data pointer will leave the original pointer empty (set to NULL).

void LeaveDataPtr(ValueType **ptr)

Leave a LocalVector to host pointers.

LeaveDataPtr has direct access to the raw data via pointers. A LocalVector object can leave its raw data to a host pointer. This will leave the LocalVector empty.

Example

// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate the vector
vec.Allocate("my_vector", 100);

// Fill vector
// ...

ValueType* ptr_vec = NULL;

// Get (steal) the data from the vector, this will leave the local vector object empty
vec.LeaveDataPtr(&ptr_vec);

virtual void Clear()

Clear (free all data) the object.

virtual void Zeros()

Set all values of the vector to 0.

virtual void Ones()

Set all values of the vector to 1.

virtual void SetValues(ValueType val)

Set all values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1))

Fill the vector with random values from interval [a,b].

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1))

Fill the vector with random values from normal distribution.

virtual void ReadFileASCII(const std::string filename)

Read vector from ASCII file.

Read a vector from ASCII file.

Example

LocalVector<ValueType> vec;
vec.ReadFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file containing the ASCII data.

virtual void WriteFileASCII(const std::string filename) const

Write vector to ASCII file.

Write a vector to ASCII file.

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file to write the ASCII data to.

virtual void ReadFileBinary(const std::string filename)

Read vector from binary file.

Read a vector from binary file. For details on the format, see WriteFileBinary().

Example

LocalVector<ValueType> vec;
vec.ReadFileBinary("my_vector.bin");

Parameters

[in] filename – name of the file containing the data.

virtual void WriteFileBinary(const std::string filename) const

Write vector to binary file.

Write a vector to binary file.

The binary format contains a header, the rocALUTION version and the vector data as follows

// Header
out << "#rocALUTION binary vector file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// Vector data
out.write((char*)&size, sizeof(int));
out.write((char*)vec_val, size * sizeof(double));

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileBinary("my_vector.bin");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters

[in] filename – name of the file to write the data to.

virtual void CopyFrom(const LocalVector<ValueType> &src)

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

[in] srcVector, where values should be copied from.

virtual void CopyFromAsync(const LocalVector<ValueType> &src)

Async copy from another local vector.

virtual void CopyFromFloat(const LocalVector<float> &src)

Copy values from another local float vector.

virtual void CopyFromDouble(const LocalVector<double> &src)

Copy values from another local double vector.

virtual void CopyFrom(const LocalVector<ValueType> &src, int src_offset, int dst_offset, int size)

Copy vector from another vector with offsets and size.

CopyFrom copies values with specific source and destination offsets and sizes from another vector.

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters
  • [in] srcVector, where values should be copied from.

  • [in] src_offset – source offset.

  • [in] dst_offset – destination offset.

  • [in] size – number of entries to be copied.

void CopyFromPermute(const LocalVector<ValueType> &src, const LocalVector<int> &permutation)

Copy a vector under permutation (forward permutation)

void CopyFromPermuteBackward(const LocalVector<ValueType> &src, const LocalVector<int> &permutation)

Copy a vector under permutation (backward permutation)

virtual void CloneFrom(const LocalVector<ValueType> &src)

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters

[in] srcVector to clone from.

void CopyFromData(const ValueType *data)

Copy (import) vector.

Copy (import) vector data that is described in one array (values). The object data has to be allocated with Allocate(), using the corresponding size of the data, first.

Parameters

[in] data – data to be imported.

void CopyToData(ValueType *data) const

Copy (export) vector.

Copy (export) vector data that is described in one array (values). The output array has to be allocated, using the corresponding size of the data, first. Size can be obtain by GetSize().

Parameters

[out] data – exported data.

void Permute(const LocalVector<int> &permutation)

Perform in-place permutation (forward) of the vector.

void PermuteBackward(const LocalVector<int> &permutation)

Perform in-place permutation (backward) of the vector.

void Restriction(const LocalVector<ValueType> &vec_fine, const LocalVector<int> &map)

Restriction operator based on restriction mapping vector.

void Prolongation(const LocalVector<ValueType> &vec_coarse, const LocalVector<int> &map)

Prolongation operator based on restriction mapping vector.

virtual void AddScale(const LocalVector<ValueType> &x, ValueType alpha)

Perform vector update of type this = this + alpha * x.

virtual void ScaleAdd(ValueType alpha, const LocalVector<ValueType> &x)

Perform vector update of type this = alpha * this + x.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta)

Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, int src_offset, int dst_offset, int size)

Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAdd2(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, const LocalVector<ValueType> &y, ValueType gamma)

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void Scale(ValueType alpha)

Perform vector scaling this = alpha * this.

virtual ValueType Dot(const LocalVector<ValueType> &x) const

Compute dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const LocalVector<ValueType> &x) const

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType Norm(void) const

Compute \(L_2\) norm of the vector, return = srqt(this^T this)

virtual ValueType Reduce(void) const

Reduce the vector.

virtual ValueType Asum(void) const

Compute the sum of absolute values of the vector, return = sum(|this|)

virtual int Amax(ValueType &value) const

Compute the absolute max of the vector, return = index(max(|this|))

virtual void PointWiseMult(const LocalVector<ValueType> &x)

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const LocalVector<ValueType> &x, const LocalVector<ValueType> &y)

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void Power(double power)

Perform power operation to a vector.

void SetIndexArray(int size, const int *index)

Set index array.

void GetIndexValues(ValueType *values) const

Get indexed values.

void SetIndexValues(const ValueType *values)

Set indexed values.

void GetContinuousValues(int start, int end, ValueType *values) const

Get continuous indexed values.

void SetContinuousValues(int start, int end, const ValueType *values)

Set continuous indexed values.

void ExtractCoarseMapping(int start, int end, const int *index, int nc, int *size, int *map) const

Extract coarse boundary mapping.

void ExtractCoarseBoundary(int start, int end, const int *index, int nc, int *size, int *boundary) const

Extract coarse boundary index.

Global Vector

template<typename ValueType>
class rocalution::GlobalVector : public rocalution::Vector<ValueType>

GlobalVector class.

A GlobalVector is called global, because it can stay on a single or on multiple nodes in a network. For this type of communication, MPI is used.

tparam ValueType

- can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

GlobalVector(const ParallelManager &pm)

Initialize a global vector with a parallel manager.

virtual void MoveToAccelerator(void)

Move the object to the accelerator backend.

virtual void MoveToHost(void)

Move the object to the host backend.

virtual void Info(void) const

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

virtual bool Check(void) const

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Returns truetrue

if the vector is ok (empty vector is also ok).

Returns falsefalse

if there is something wrong with the values.

virtual IndexType2 GetSize(void) const

Return the size of the vector.

virtual int GetLocalSize(void) const

Return the size of the local vector.

virtual int GetGhostSize(void) const

Return the size of the ghost vector.

virtual void Allocate(std::string name, IndexType2 size)

Allocate a global vector with name and size.

virtual void Clear(void)

Clear (free all data) the object.

void SetParallelManager(const ParallelManager &pm)

Set the parallel manager of a global vector.

virtual void Zeros(void)

Set all values of the vector to 0.

virtual void Ones(void)

Set all values of the vector to 1.

virtual void SetValues(ValueType val)

Set all values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1))

Fill the vector with random values from interval [a,b].

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1))

Fill the vector with random values from normal distribution.

virtual void CloneFrom(const GlobalVector<ValueType> &src)

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters

[in] srcVector to clone from.

ValueType &operator[](int i)

Access operator (only for host data)

const ValueType &operator[](int i) const

Access operator (only for host data)

void SetDataPtr(ValueType **ptr, std::string name, IndexType2 size)

Initialize the local part of a global vector with externally allocated data.

void LeaveDataPtr(ValueType **ptr)

Get a pointer to the data from the local part of a global vector and free the global vector object.

virtual void CopyFrom(const GlobalVector<ValueType> &src)

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

[in] srcVector, where values should be copied from.

virtual void ReadFileASCII(const std::string filename)

Read vector from ASCII file.

Read a vector from ASCII file.

Example

LocalVector<ValueType> vec;
vec.ReadFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file containing the ASCII data.

virtual void WriteFileASCII(const std::string filename) const

Write vector to ASCII file.

Write a vector to ASCII file.

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileASCII("my_vector.dat");

Parameters

[in] filename – name of the file to write the ASCII data to.

virtual void ReadFileBinary(const std::string filename)

Read vector from binary file.

Read a vector from binary file. For details on the format, see WriteFileBinary().

Example

LocalVector<ValueType> vec;
vec.ReadFileBinary("my_vector.bin");

Parameters

[in] filename – name of the file containing the data.

virtual void WriteFileBinary(const std::string filename) const

Write vector to binary file.

Write a vector to binary file.

The binary format contains a header, the rocALUTION version and the vector data as follows

// Header
out << "#rocALUTION binary vector file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// Vector data
out.write((char*)&size, sizeof(int));
out.write((char*)vec_val, size * sizeof(double));

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileBinary("my_vector.bin");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters

[in] filename – name of the file to write the data to.

virtual void AddScale(const GlobalVector<ValueType> &x, ValueType alpha)

Perform vector update of type this = this + alpha * x.

virtual void ScaleAdd(ValueType alpha, const GlobalVector<ValueType> &x)

Perform vector update of type this = alpha * this + x.

virtual void ScaleAdd2(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, const GlobalVector<ValueType> &y, ValueType gamma)

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta)

Perform vector update of type this = alpha * this + x * beta.

virtual void Scale(ValueType alpha)

Perform vector scaling this = alpha * this.

virtual ValueType Dot(const GlobalVector<ValueType> &x) const

Compute dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const GlobalVector<ValueType> &x) const

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType Norm(void) const

Compute \(L_2\) norm of the vector, return = srqt(this^T this)

virtual ValueType Reduce(void) const

Reduce the vector.

virtual ValueType Asum(void) const

Compute the sum of absolute values of the vector, return = sum(|this|)

virtual int Amax(ValueType &value) const

Compute the absolute max of the vector, return = index(max(|this|))

virtual void PointWiseMult(const GlobalVector<ValueType> &x)

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const GlobalVector<ValueType> &x, const GlobalVector<ValueType> &y)

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void Power(double power)

Perform power operation to a vector.

void Restriction(const GlobalVector<ValueType> &vec_fine, const LocalVector<int> &map)

Restriction operator based on restriction mapping vector.

void Prolongation(const GlobalVector<ValueType> &vec_coarse, const LocalVector<int> &map)

Prolongation operator based on restriction mapping vector.

Base Classes

template<typename ValueType>
class BaseMatrix
template<typename ValueType>
class BaseStencil
template<typename ValueType>
class BaseVector
template<typename ValueType>
class HostMatrix
template<typename ValueType>
class HostStencil
template<typename ValueType>
class HostVector
template<typename ValueType>
class AcceleratorMatrix
template<typename ValueType>
class AcceleratorStencil
template<typename ValueType>
class AcceleratorVector

Parallel Manager

class rocalution::ParallelManager : public rocalution::RocalutionObj

Parallel Manager class.

The parallel manager class handles the communication and the mapping of the global operators. Each global operator and vector need to be initialized with a valid parallel manager in order to perform any operation. For many distributed simulations, the underlying operator is already distributed. This information need to be passed to the parallel manager.

Public Functions

void SetMPICommunicator(const void *comm)

Set the MPI communicator.

void Clear(void)

Clear all allocated resources.

IndexType2 GetGlobalSize(void) const

Return the global size.

int GetLocalSize(void) const

Return the local size.

int GetNumReceivers(void) const

Return the number of receivers.

int GetNumSenders(void) const

Return the number of senders.

int GetNumProcs(void) const

Return the number of involved processes.

void SetGlobalSize(IndexType2 size)

Initialize the global size.

void SetLocalSize(int size)

Initialize the local size.

void SetBoundaryIndex(int size, const int *index)

Set all boundary indices of this ranks process.

void SetReceivers(int nrecv, const int *recvs, const int *recv_offset)

Number of processes, the current process is receiving data from, array of the processes, the current process is receiving data from and offsets, where the boundary for process ‘receiver’ starts.

void SetSenders(int nsend, const int *sends, const int *send_offset)

Number of processes, the current process is sending data to, array of the processes, the current process is sending data to and offsets where the ghost part for process ‘sender’ starts.

void LocalToGlobal(int proc, int local, int &global)

Mapping local to global.

void GlobalToLocal(int global, int &proc, int &local)

Mapping global to local.

bool Status(void) const

Check sanity status of parallel manager.

void ReadFileASCII(const std::string filename)

Read file that contains all relevant parallel manager data.

void WriteFileASCII(const std::string filename) const

Write file that contains all relevant parallel manager data.

Solvers

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::Solver : public rocalution::RocalutionObj

Base class for all solvers and preconditioners.

Most of the solvers can be performed on linear operators LocalMatrix, LocalStencil and GlobalMatrix - i.e. the solvers can be performed locally (on a shared memory system) or in a distributed manner (on a cluster) via MPI. The only exception is the AMG (Algebraic Multigrid) solver which has two versions (one for LocalMatrix and one for GlobalMatrix class). The only pure local solvers (which do not support global/MPI operations) are the mixed-precision defect-correction solver and all direct solvers.

All solvers need three template parameters - Operators, Vectors and Scalar type.

The Solver class is purely virtual and provides an interface for

  • SetOperator() to set the operator \(A\), i.e. the user can pass the matrix here.

  • Build() to build the solver (including preconditioners, sub-solvers, etc.). The user need to specify the operator first before calling Build().

  • Solve() to solve the system \(Ax = b\). The user need to pass a right-hand-side \(b\) and a vector \(x\), where the solution will be obtained.

  • Print() to show solver information.

  • ReBuildNumeric() to only re-build the solver numerically (if possible).

  • MoveToHost() and MoveToAccelerator() to offload the solver (including preconditioners and sub-solvers) to the host/accelerator.

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::DirectLinearSolver< OperatorType, VectorType, ValueType >, rocalution::IterativeLinearSolver< OperatorType, VectorType, ValueType >, rocalution::Preconditioner< OperatorType, VectorType, ValueType >

Public Functions

void SetOperator(const OperatorType &op)

Set the Operator of the solver.

virtual void ResetOperator(const OperatorType &op)

Reset the operator; see ReBuildNumeric()

virtual void Print(void) const = 0

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x) = 0

Solve Operator x = rhs.

virtual void SolveZeroSol(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs, setting initial x = 0.

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void BuildMoveToAcceleratorAsync(void)

Build the solver and move it to the accelerator asynchronously.

virtual void Sync(void)

Synchronize the solver.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void MoveToHost(void)

Move all data (i.e. move the solver) to the host.

virtual void MoveToAccelerator(void)

Move all data (i.e. move the solver) to the accelerator.

virtual void Verbose(int verb = 1)

Provide verbose output of the solver.

  • verb = 0 -> no output

  • verb = 1 -> print info about the solver (start, end);

  • verb = 2 -> print (iter, residual) via iteration control;

Iterative Linear Solvers

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::IterativeLinearSolver : public rocalution::Solver<OperatorType, VectorType, ValueType>

Base class for all linear iterative solvers.

The iterative solvers are controlled by an iteration control object, which monitors the convergence properties of the solver, i.e. maximum number of iteration, relative tolerance, absolute tolerance and divergence tolerance. The iteration control can also record the residual history and store it in an ASCII file.

All iterative solvers are controlled based on

  • Absolute stopping criteria, when \(|r_{k}|_{L_{p}} < \epsilon_{abs}\)

  • Relative stopping criteria, when \(|r_{k}|_{L_{p}} / |r_{1}|_{L_{p}} \leq \epsilon_{rel}\)

  • Divergence stopping criteria, when \(|r_{k}|_{L_{p}} / |r_{1}|_{L_{p}} \geq \epsilon_{div}\)

  • Maximum number of iteration \(N\), when \(k = N\)

where \(k\) is the current iteration, \(r_{k}\) the residual for the current iteration \(k\) (i.e. \(r_{k} = b - Ax_{k}\)) and \(r_{1}\) the starting residual (i.e. \(r_{1} = b - Ax_{init}\)). In addition, the minimum number of iterations \(M\) can be specified. In this case, the solver will not stop to iterate, before \(k \geq M\).

The \(L_{p}\) norm is used for the computation, where \(p\) could be 1, 2 and \(\infty\). The norm computation can be set with SetResidualNorm() with 1 for \(L_{1}\), 2 for \(L_{2}\) and 3 for \(L_{\infty}\). For the computation with \(L_{\infty}\), the index of the maximum value can be obtained with GetAmaxResidualIndex(). If this function is called and \(L_{\infty}\) was not selected, this function will return -1.

The reached criteria can be obtained with GetSolverStatus(), returning

  • 0, if no criteria has been reached yet

  • 1, if absolute tolerance has been reached

  • 2, if relative tolerance has been reached

  • 3, if divergence tolerance has been reached

  • 4, if maximum number of iteration has been reached

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::BaseMultiGrid< OperatorType, VectorType, ValueType >, rocalution::BiCGStab< OperatorType, VectorType, ValueType >, rocalution::BiCGStabl< OperatorType, VectorType, ValueType >, rocalution::CG< OperatorType, VectorType, ValueType >, rocalution::Chebyshev< OperatorType, VectorType, ValueType >, rocalution::CR< OperatorType, VectorType, ValueType >, rocalution::FCG< OperatorType, VectorType, ValueType >, rocalution::FGMRES< OperatorType, VectorType, ValueType >, rocalution::FixedPoint< OperatorType, VectorType, ValueType >, rocalution::GMRES< OperatorType, VectorType, ValueType >, rocalution::IDR< OperatorType, VectorType, ValueType >, rocalution::QMRCGStab< OperatorType, VectorType, ValueType >

Public Functions

void Init(double abs_tol, double rel_tol, double div_tol, int max_iter)

Initialize the solver with absolute/relative/divergence tolerance and maximum number of iterations.

void Init(double abs_tol, double rel_tol, double div_tol, int min_iter, int max_iter)

Initialize the solver with absolute/relative/divergence tolerance and minimum/maximum number of iterations.

void InitMinIter(int min_iter)

Set the minimum number of iterations.

void InitMaxIter(int max_iter)

Set the maximum number of iterations.

void InitTol(double abs, double rel, double div)

Set the absolute/relative/divergence tolerance.

void SetResidualNorm(int resnorm)

Set the residual norm to \(L_1\), \(L_2\) or \(L_\infty\) norm.

  • resnorm = 1 -> \(L_1\) norm

  • resnorm = 2 -> \(L_2\) norm

  • resnorm = 3 -> \(L_\infty\) norm

void RecordResidualHistory(void)

Record the residual history.

void RecordHistory(const std::string filename) const

Write the history to file.

virtual void Verbose(int verb = 1)

Set the solver verbosity output.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void SetPreconditioner(Solver<OperatorType, VectorType, ValueType> &precond)

Set a preconditioner of the linear solver.

virtual int GetIterationCount(void)

Return the iteration count.

virtual double GetCurrentResidual(void)

Return the current residual.

virtual int GetSolverStatus(void)

Return the current status.

virtual int GetAmaxResidualIndex(void)

Return absolute maximum index of residual vector when using \(L_\infty\) norm.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::FixedPoint : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Fixed-Point Iteration Scheme.

The Fixed-Point iteration scheme is based on additive splitting of the matrix \(A = M + N\). The scheme reads

\[ x_{k+1} = M^{-1} (b - N x_{k}). \]
It can also be reformulated as a weighted defect correction scheme
\[ x_{k+1} = x_{k} - \omega M^{-1} (Ax_{k} - b). \]
The inversion of \(M\) can be performed by preconditioners (Jacobi, Gauss-Seidel, ILU, etc.) or by any type of solvers.

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

void SetRelaxation(ValueType omega)

Set relaxation parameter \(\omega\).

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorTypeH, class VectorTypeH, typename ValueTypeH, class OperatorTypeL, class VectorTypeL, typename ValueTypeL>
class rocalution::MixedPrecisionDC : public rocalution::IterativeLinearSolver<OperatorTypeH, VectorTypeH, ValueTypeH>

Mixed-Precision Defect Correction Scheme.

The Mixed-Precision solver is based on a defect-correction scheme. The current implementation of the library is using host based correction in double precision and accelerator computation in single precision. The solver is implemeting the scheme

\[ x_{k+1} = x_{k} + A^{-1} r_{k}, \]
where the computation of the residual \(r_{k} = b - Ax_{k}\) and the update \(x_{k+1} = x_{k} + d_{k}\) are performed on the host in double precision. The computation of the residual system \(Ad_{k} = r_{k}\) is performed on the accelerator in single precision. In addition to the setup functions of the iterative solver, the user need to specify the inner ( \(Ad_{k} = r_{k}\)) solver.

tparam OperatorTypeH

- can be LocalMatrix

tparam VectorTypeH

- can be LocalVector

tparam ValueTypeH

- can be double

tparam OperatorTypeL

- can be LocalMatrix

tparam VectorTypeL

- can be LocalVector

tparam ValueTypeL

- can be float

Public Functions

virtual void Print(void) const

Print information about the solver.

void Set(Solver<OperatorTypeL, VectorTypeL, ValueTypeL> &Solver_L)

Set the inner solver for \(Ad_{k} = r_{k}\).

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::Chebyshev : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Chebyshev Iteration Scheme.

The Chebyshev Iteration scheme (also known as acceleration scheme) is similar to the CG method but requires minimum and maximum eigenvalues of the operator. [1]

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

void Set(ValueType lambda_min, ValueType lambda_max)

Set the minimum and maximum eigenvalues of the operator.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

Krylov Subspace Solvers

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BiCGStab : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Bi-Conjugate Gradient Stabilized Method.

The Bi-Conjugate Gradient Stabilized method is a variation of CGS and solves sparse (non) symmetric linear systems \(Ax=b\). [11]

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BiCGStabl : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Bi-Conjugate Gradient Stabilized (l) Method.

The Bi-Conjugate Gradient Stabilized (l) method is a generalization of BiCGStab for solving sparse (non) symmetric linear systems \(Ax=b\). It minimizes residuals over \(l\)-dimensional Krylov subspaces. The degree \(l\) can be set with SetOrder(). [4]

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void SetOrder(int l)

Set the order.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::CG : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Conjugate Gradient Method.

The Conjugate Gradient method is the best known iterative method for solving sparse symmetric positive definite (SPD) linear systems \(Ax=b\). It is based on orthogonal projection onto the Krylov subspace \(\mathcal{K}_{m}(r_{0}, A)\), where \(r_{0}\) is the initial residual. The method can be preconditioned, where the approximation should also be SPD. [11]

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void BuildMoveToAcceleratorAsync(void)

Build the solver and move it to the accelerator asynchronously.

virtual void Sync(void)

Synchronize the solver.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::CR : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Conjugate Residual Method.

The Conjugate Residual method is an iterative method for solving sparse symmetric semi-positive definite linear systems \(Ax=b\). It is a Krylov subspace method and differs from the much more popular Conjugate Gradient method that the system matrix is not required to be positive definite. The method can be preconditioned where the approximation should also be SPD or semi-positive definite. [11]

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::FCG : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Flexible Conjugate Gradient Method.

The Flexible Conjugate Gradient method is an iterative method for solving sparse symmetric positive definite linear systems \(Ax=b\). It is similar to the Conjugate Gradient method with the only difference, that it allows the preconditioner \(M^{-1}\) to be not a constant operator. This can be especially helpful if the operation \(M^{-1}x\) is the result of another iterative process and not a constant operator. [9]

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::GMRES : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Generalized Minimum Residual Method.

The Generalized Minimum Residual method (GMRES) is a projection method for solving sparse (non) symmetric linear systems \(Ax=b\), based on restarting technique. The solution is approximated in a Krylov subspace \(\mathcal{K}=\mathcal{K}_{m}\) and \(\mathcal{L}=A\mathcal{K}_{m}\) with minimal residual, where \(\mathcal{K}_{m}\) is the \(m\)-th Krylov subspace with \(v_{1} = r_{0}/||r_{0}||_{2}\). [11]

The Krylov subspace basis size can be set using SetBasisSize(). The default size is 30.

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void SetBasisSize(int size_basis)

Set the size of the Krylov subspace basis.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::FGMRES : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Flexible Generalized Minimum Residual Method.

The Flexible Generalized Minimum Residual method (FGMRES) is a projection method for solving sparse (non) symmetric linear systems \(Ax=b\). It is similar to the GMRES method with the only difference, the FGMRES is based on a window shifting of the Krylov subspace and thus allows the preconditioner \(M^{-1}\) to be not a constant operator. This can be especially helpful if the operation \(M^{-1}x\) is the result of another iterative process and not a constant operator. [11]

The Krylov subspace basis size can be set using SetBasisSize(). The default size is 30.

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void SetBasisSize(int size_basis)

Set the size of the Krylov subspace basis.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::IDR : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Induced Dimension Reduction Method.

The Induced Dimension Reduction method is a Krylov subspace method for solving sparse (non) symmetric linear systems \(Ax=b\). IDR(s) generates residuals in a sequence of nested subspaces. [12] [14]

The dimension of the shadow space can be set by SetShadowSpace(). The default size of the shadow space is 4.

tparam OperatorType

- can be LocalMatrix, GlobalMatrix or LocalStencil

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

void SetShadowSpace(int s)

Set the size of the Shadow Space.

void SetRandomSeed(unsigned long long seed)

Set random seed for ONB creation (seed must be greater than 0)

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::QMRCGStab : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Quasi-Minimal Residual Conjugate Gradient Stabilized Method.

The Quasi-Minimal Residual Conjugate Gradient Stabilized method is a variant of the Krylov subspace BiCGStab method for solving sparse (non) symmetric linear systems \(Ax=b\). [7]

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

MultiGrid Solvers

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BaseMultiGrid : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>

Base class for all multigrid solvers [13].

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::BaseAMG< OperatorType, VectorType, ValueType >, rocalution::MultiGrid< OperatorType, VectorType, ValueType >

Public Functions

virtual void Print(void) const

Print information about the solver.

void SetSolver(Solver<OperatorType, VectorType, ValueType> &solver)

Set the coarse grid solver.

void SetSmoother(IterativeLinearSolver<OperatorType, VectorType, ValueType> **smoother)

Set the smoother for each level.

void SetSmootherPreIter(int iter)

Set the number of pre-smoothing steps.

void SetSmootherPostIter(int iter)

Set the number of post-smoothing steps.

virtual void SetRestrictOperator(OperatorType **op) = 0

Set the restriction operator for each level.

virtual void SetProlongOperator(OperatorType **op) = 0

Set the prolongation operator for each level.

virtual void SetOperatorHierarchy(OperatorType **op) = 0

Set the operator for each level.

void SetScaling(bool scaling)

Enable/disable scaling of intergrid transfers.

void SetHostLevels(int levels)

Force computation of coarser levels on the host backend.

void SetCycle(unsigned int cycle)

Set the MultiGrid Cycle (default: Vcycle)

void SetKcycleFull(bool kcycle_full)

Set the MultiGrid Kcycle on all levels or only on finest level.

void InitLevels(int levels)

Set the depth of the multigrid solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::MultiGrid : public rocalution::BaseMultiGrid<OperatorType, VectorType, ValueType>

MultiGrid Method.

The MultiGrid method can be used with external data, such as externally computed restriction, prolongation and operator hierarchy. The user need to pass all this information for each level and for its construction. This includes smoothing step, prolongation/restriction, grid traversing and coarse grid solver. This data need to be passed to the solver. [13]

  • Restriction and prolongation operations can be performed in two ways, based on Restriction() and Prolongation() of the LocalVector class, or by matrix-vector multiplication. This is configured by a set function.

  • Smoothers can be of any iterative linear solver. Valid options are Jacobi, Gauss-Seidel, ILU, etc. using a FixedPoint iteration scheme with pre-defined number of iterations. The smoothers could also be a solver such as CG, BiCGStab, etc.

  • Coarse grid solver could be of any iterative linear solver type. The class also provides mechanisms to specify, where the coarse grid solver has to be performed, on the host or on the accelerator. The coarse grid solver can be preconditioned.

  • Grid scaling based on a \(L_2\) norm ratio.

  • Operator matrices need to be passed on each grid level.

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void SetRestrictOperator(OperatorType **op)

Set the restriction operator for each level.

virtual void SetProlongOperator(OperatorType **op)

Set the prolongation operator for each level.

virtual void SetOperatorHierarchy(OperatorType **op)

Set the operator for each level.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BaseAMG : public rocalution::BaseMultiGrid<OperatorType, VectorType, ValueType>

Base class for all algebraic multigrid solvers.

The Algebraic MultiGrid solver is based on the BaseMultiGrid class. The coarsening is obtained by different aggregation techniques. The smoothers can be constructed inside or outside of the class.

All parameters in the Algebraic MultiGrid class can be set externally, including smoothers and coarse grid solver.

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::GlobalPairwiseAMG< OperatorType, VectorType, ValueType >, rocalution::PairwiseAMG< OperatorType, VectorType, ValueType >, rocalution::RugeStuebenAMG< OperatorType, VectorType, ValueType >, rocalution::SAAMG< OperatorType, VectorType, ValueType >, rocalution::UAAMG< OperatorType, VectorType, ValueType >

Public Functions

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void ClearLocal(void)

Clear all local data.

virtual void BuildHierarchy(void)

Create AMG hierarchy.

virtual void BuildSmoothers(void)

Create AMG smoothers.

void SetCoarsestLevel(int coarse_size)

Set coarsest level for hierarchy creation.

void SetManualSmoothers(bool sm_manual)

Set flag to pass smoothers manually for each level.

void SetManualSolver(bool s_manual)

Set flag to pass coarse grid solver manually.

void SetDefaultSmootherFormat(unsigned int op_format)

Set the smoother operator format.

void SetOperatorFormat(unsigned int op_format)

Set the operator format.

int GetNumLevels(void)

Returns the number of levels in hierarchy.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::UAAMG : public rocalution::BaseAMG<OperatorType, VectorType, ValueType>

Unsmoothed Aggregation Algebraic MultiGrid Method.

The Unsmoothed Aggregation Algebraic MultiGrid method is based on unsmoothed aggregation based interpolation scheme. stueben

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void BuildSmoothers(void)

Create AMG smoothers.

void SetCouplingStrength(ValueType eps)

Set coupling strength.

void SetOverInterp(ValueType overInterp)

Set over-interpolation parameter for aggregation.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::SAAMG : public rocalution::BaseAMG<OperatorType, VectorType, ValueType>

Smoothed Aggregation Algebraic MultiGrid Method.

The Smoothed Aggregation Algebraic MultiGrid method is based on smoothed aggregation based interpolation scheme. [15]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void BuildSmoothers(void)

Create AMG smoothers.

void SetCouplingStrength(ValueType eps)

Set coupling strength.

void SetInterpRelax(ValueType relax)

Set the relaxation parameter.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::RugeStuebenAMG : public rocalution::BaseAMG<OperatorType, VectorType, ValueType>

Ruge-Stueben Algebraic MultiGrid Method.

The Ruge-Stueben Algebraic MultiGrid method is based on the classic Ruge-Stueben coarsening with direct interpolation. The solver provides high-efficiency in terms of complexity of the solver (i.e. number of iterations). However, most of the time it has a higher building step and requires higher memory usage. stueben

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void BuildSmoothers(void)

Create AMG smoothers.

void SetCouplingStrength(ValueType eps)

Set coupling strength.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::PairwiseAMG : public rocalution::BaseAMG<OperatorType, VectorType, ValueType>

Pairwise Aggregation Algebraic MultiGrid Method.

The Pairwise Aggregation Algebraic MultiGrid method is based on a pairwise aggregation matching scheme. It delivers very efficient building phase which is suitable for Poisson-like equation. Most of the time it requires K-cycle for the solving phase to provide low number of iterations. [10]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void BuildHierarchy(void)

Create AMG hierarchy.

virtual void ClearLocal(void)

Clear all local data.

virtual void BuildSmoothers(void)

Create AMG smoothers.

void SetBeta(ValueType beta)

Set beta for pairwise aggregation.

void SetOrdering(unsigned int ordering)

Set re-ordering for aggregation.

void SetCoarseningFactor(double factor)

Set target coarsening factor.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::GlobalPairwiseAMG : public rocalution::BaseAMG<OperatorType, VectorType, ValueType>

Pairwise Aggregation Algebraic MultiGrid Method (multi-node)

The Pairwise Aggregation Algebraic MultiGrid method is based on a pairwise aggregation matching scheme. It delivers very efficient building phase which is suitable for Poisson-like equation. Most of the time it requires K-cycle for the solving phase to provide low number of iterations. This version has multi-node support. [10]

tparam OperatorType

- can be GlobalMatrix

tparam VectorType

- can be GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void BuildHierarchy(void)

Create AMG hierarchy.

virtual void ClearLocal(void)

Clear all local data.

virtual void SetBeta(ValueType beta)

Set beta for pairwise aggregation.

virtual void SetOrdering(const _aggregation_ordering ordering)

Set re-ordering for aggregation.

virtual void SetCoarseningFactor(double factor)

Set target coarsening factor.

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

Direct Solvers

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::DirectLinearSolver : public rocalution::Solver<OperatorType, VectorType, ValueType>

Base class for all direct linear solvers.

The library provides three direct methods - LU, QR and Inversion (based on QR decomposition). The user can pass a sparse matrix, internally it will be converted to dense and then the selected method will be applied. These methods are not very optimal and due to the fact that the matrix is converted to a dense format, these methods should be used only for very small matrices.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::Inversion< OperatorType, VectorType, ValueType >, rocalution::LU< OperatorType, VectorType, ValueType >, rocalution::QR< OperatorType, VectorType, ValueType >

Public Functions

virtual void Verbose(int verb = 1)

Provide verbose output of the solver.

  • verb = 0 -> no output

  • verb = 1 -> print info about the solver (start, end);

  • verb = 2 -> print (iter, residual) via iteration control;

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::Inversion : public rocalution::DirectLinearSolver<OperatorType, VectorType, ValueType>

Matrix Inversion.

Full matrix inversion based on QR decomposition.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::LU : public rocalution::DirectLinearSolver<OperatorType, VectorType, ValueType>

LU Decomposition.

Lower-Upper Decomposition factors a given square matrix into lower and upper triangular matrix, such that \(A = LU\).

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::QR : public rocalution::DirectLinearSolver<OperatorType, VectorType, ValueType>

QR Decomposition.

The QR Decomposition decomposes a given matrix into \(A = QR\), such that \(Q\) is an orthogonal matrix and \(R\) an upper triangular matrix.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

Preconditioners

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::Preconditioner : public rocalution::Solver<OperatorType, VectorType, ValueType>

Base class for all preconditioners.

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::AIChebyshev< OperatorType, VectorType, ValueType >, rocalution::AS< OperatorType, VectorType, ValueType >, rocalution::BlockJacobi< OperatorType, VectorType, ValueType >, rocalution::BlockPreconditioner< OperatorType, VectorType, ValueType >, rocalution::DiagJacobiSaddlePointPrecond< OperatorType, VectorType, ValueType >, rocalution::FSAI< OperatorType, VectorType, ValueType >, rocalution::GS< OperatorType, VectorType, ValueType >, rocalution::IC< OperatorType, VectorType, ValueType >, rocalution::ILU< OperatorType, VectorType, ValueType >, rocalution::ILUT< OperatorType, VectorType, ValueType >, rocalution::Jacobi< OperatorType, VectorType, ValueType >, rocalution::MultiColored< OperatorType, VectorType, ValueType >, rocalution::MultiElimination< OperatorType, VectorType, ValueType >, rocalution::SGS< OperatorType, VectorType, ValueType >, rocalution::SPAI< OperatorType, VectorType, ValueType >, rocalution::TNS< OperatorType, VectorType, ValueType >, rocalution::VariablePreconditioner< OperatorType, VectorType, ValueType >

Public Functions

virtual void SolveZeroSol(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs, setting initial x = 0.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::AIChebyshev : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Approximate Inverse - Chebyshev Preconditioner.

The Approximate Inverse - Chebyshev Preconditioner is an inverse matrix preconditioner with values from a linear combination of matrix-valued Chebyshev polynomials. [3]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

void Set(int p, ValueType lambda_min, ValueType lambda_max)

Set order, min and max eigenvalues.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::FSAI : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Factorized Approximate Inverse Preconditioner.

The Factorized Sparse Approximate Inverse preconditioner computes a direct approximation of \(M^{-1}\) by minimizing the Frobenius norm \(||I - GL||_{F}\), where \(L\) denotes the exact lower triangular part of \(A\) and \(G:=M^{-1}\). The FSAI preconditioner is initialized by \(q\), based on the sparsity pattern of \(|A^{q}|\). However, it is also possible to supply external sparsity patterns in form of the LocalMatrix class. [6]

Note

The FSAI preconditioner is only suited for symmetric positive definite matrices.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

void Set(int power)

Set the power of the system matrix sparsity pattern.

void Set(const OperatorType &pattern)

Set an external sparsity pattern.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

void SetPrecondMatrixFormat(unsigned int mat_format, int blockdim = 1)

Set the matrix format of the preconditioner.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::SPAI : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

SParse Approximate Inverse Preconditioner.

The SParse Approximate Inverse algorithm is an explicitly computed preconditioner for general sparse linear systems. In its current implementation, only the sparsity pattern of the system matrix is supported. The SPAI computation is based on the minimization of the Frobenius norm \(||AM - I||_{F}\). [5]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

void SetPrecondMatrixFormat(unsigned int mat_format, int blockdim = 1)

Set the matrix format of the preconditioner.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::TNS : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Truncated Neumann Series Preconditioner.

The Truncated Neumann Series (TNS) preconditioner is based on \(M^{-1} = K^{T} D^{-1} K\), where \(K=(I-LD^{-1}+(LD^{-1})^{2})\), with the diagonal \(D\) of \(A\) and the strictly lower triangular part \(L\) of \(A\). The preconditioner can be computed in two forms - explicitly and implicitly. In the implicit form, the full construction of \(M\) is performed via matrix-matrix operations, whereas in the explicit from, the application of the preconditioner is based on matrix-vector operations only. The matrix format for the stored matrices can be specified.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

void Set(bool imp)

Set implicit (true) or explicit (false) computation.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

void SetPrecondMatrixFormat(unsigned int mat_format, int blockdim = 1)

Set the matrix format of the preconditioner.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::AS : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Additive Schwarz Preconditioner.

The Additive Schwarz preconditioner relies on a preconditioning technique, where the linear system \(Ax=b\) can be decomposed into small sub-problems based on \(A_{i} = R_{i}^{T}AR_{i}\), where \(R_{i}\) are restriction operators. Those restriction operators produce sub-matrices wich overlap. This leads to contributions from two preconditioners on the overlapped area which are scaled by \(1/2\). [2]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Subclassed by rocalution::RAS< OperatorType, VectorType, ValueType >

Public Functions

virtual void Print(void) const

Print information about the solver.

void Set(int nb, int overlap, Solver<OperatorType, VectorType, ValueType> **preconds)

Set number of blocks, overlap and array of preconditioners.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::RAS : public rocalution::AS<OperatorType, VectorType, ValueType>

Restricted Additive Schwarz Preconditioner.

The Restricted Additive Schwarz preconditioner relies on a preconditioning technique, where the linear system \(Ax=b\) can be decomposed into small sub-problems based on \(A_{i} = R_{i}^{T}AR_{i}\), where \(R_{i}\) are restriction operators. The RAS method is a mixture of block Jacobi and the AS scheme. In this case, the sub-matrices contain overlapped areas from other blocks, too. [2]

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BlockJacobi : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Block-Jacobi Preconditioner.

The Block-Jacobi preconditioner is designed to wrap any local preconditioner and apply it in a global block fashion locally on each interior matrix.

tparam OperatorType

- can be GlobalMatrix

tparam VectorType

- can be GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

void Set(Solver<LocalMatrix<ValueType>, LocalVector<ValueType>, ValueType> &precond)

Set local preconditioner.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void SolveZeroSol(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs, setting initial x = 0.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::BlockPreconditioner : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Block-Preconditioner.

When handling vector fields, typically one can try to use different preconditioners and/or solvers for the different blocks. For such problems, the library provides a block-type preconditioner. This preconditioner builds the following block-type matrix

\[\begin{split} P = \begin{pmatrix} A_{d} & 0 & . & 0 \\ B_{1} & B_{d} & . & 0 \\ . & . & . & . \\ Z_{1} & Z_{2} & . & Z_{d} \end{pmatrix} \end{split}\]
The solution of \(P\) can be performed in two ways. It can be solved by block-lower-triangular sweeps with inversion of the blocks \(A_{d} \ldots Z_{d}\) and with a multiplication of the corresponding blocks. This is set by SetLSolver() (which is the default solution scheme). Alternatively, it can be used only with an inverse of the diagonal \(A_{d} \ldots Z_{d}\) (Block-Jacobi type) by using SetDiagonalSolver().

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Clear(void)

Clear (free all local data) the solver.

void Set(int n, const int *size, Solver<OperatorType, VectorType, ValueType> **D_solver)

Set number, size and diagonal solver.

void SetDiagonalSolver(void)

Set diagonal solver mode.

void SetLSolver(void)

Set lower triangular sweep mode.

void SetExternalLastMatrix(const OperatorType &mat)

Set external last block matrix.

virtual void SetPermutation(const LocalVector<int> &perm)

Set permutation vector.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::Jacobi : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Jacobi Method.

The Jacobi method is for solving a diagonally dominant system of linear equations \(Ax=b\). It solves for each diagonal element iteratively until convergence, such that

\[ x_{i}^{(k+1)} = (1 - \omega)x_{i}^{(k)} + \frac{\omega}{a_{ii}} \left( b_{i} - \sum\limits_{j=1}^{i-1}{a_{ij}x_{j}^{(k)}} - \sum\limits_{j=i}^{n}{a_{ij}x_{j}^{(k)}} \right) \]

tparam OperatorType

- can be LocalMatrix or GlobalMatrix

tparam VectorType

- can be LocalVector or GlobalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void ResetOperator(const OperatorType &op)

Reset the operator; see ReBuildNumeric()

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::GS : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Gauss-Seidel / Successive Over-Relaxation Method.

The Gauss-Seidel / SOR method is for solving system of linear equations \(Ax=b\). It approximates the solution iteratively with

\[ x_{i}^{(k+1)} = (1 - \omega) x_{i}^{(k)} + \frac{\omega}{a_{ii}} \left( b_{i} - \sum\limits_{j=1}^{i-1}{a_{ij}x_{j}^{(k+1)}} - \sum\limits_{j=i}^{n}{a_{ij}x_{j}^{(k)}} \right), \]
with \(\omega \in (0,2)\).

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x)

Solve Operator x = rhs.

virtual void Build(void)

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)

Clear (free all local data) the solver.

virtual void ResetOperator(const OperatorType &op)

Reset the operator; see ReBuildNumeric()

template<class OperatorType, class VectorType, typename ValueType>
class rocalution::SGS : public rocalution::Preconditioner<OperatorType, VectorType, ValueType>

Symmetric Gauss-Seidel / Symmetric Successive Over-Relaxation Method.

The Symmetric Gauss-Seidel / SSOR method is for solving system of linear equations \(Ax=b\). It approximates the solution iteratively.

tparam OperatorType

- can be LocalMatrix

tparam VectorType

- can be LocalVector

tparam ValueType

- can be float, double, std::complex<float> or std::complex<double>

Public Functions

virtual void Print(void) const

Print information about the solver.