API¶

This section provides a detailed list of the library API

Host Utility Functions¶

template<typename DataType>
void rocalution::allocate_host(int64_t size, DataType **ptr)¶

Allocate buffer on the host.

allocate_host allocates a buffer on the host.

Parameters

size – [in] number of elements the buffer need to be allocated for
ptr – [out] pointer to the position in memory where the buffer should be allocated, it is expected that *ptr == NULL

Template Parameters

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType>
void rocalution::free_host(DataType **ptr)¶

Free buffer on the host.

free_host deallocates a buffer on the host. *ptr will be set to NULL after successful deallocation.

Parameters: ptr – [inout] pointer to the position in memory where the buffer should be deallocated, it is expected that *ptr != NULL
Template Parameters: DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType> void rocalution::set_to_zero_host(int64_t size, DataType *ptr)¶

Set a host buffer to zero.

set_to_zero_host sets a host buffer to zero.

Parameters

size – [in] number of elements
ptr – [inout] pointer to the host buffer

Template Parameters

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

double rocalution::rocalution_time(void)¶: Return current time in microseconds.

Backend Manager¶

int rocalution::init_rocalution(int rank = -1, int dev_per_node = 1)¶

Initialize rocALUTION platform.

init_rocalution defines a backend descriptor with information about the hardware and its specifications. All objects created after that contain a copy of this descriptor. If the specifications of the global descriptor are changed (e.g. set different number of threads) and new objects are created, only the new objects will use the new configurations.

For control, the library provides the following functions

set_device_rocalution() is a unified function to select a specific device. If you have compiled the library with a backend and for this backend there are several available devices, you can use this function to select a particular one. This function has to be called before init_rocalution().
set_omp_threads_rocalution() sets the number of OpenMP threads. This function has to be called after init_rocalution().

Example

#include <rocalution/rocalution.hpp>

using namespace rocalution;

int main(int argc, char* argv[])
{
    init_rocalution();

    // ...

    stop_rocalution();

    return 0;
}

Parameters

rank – [in] specifies MPI rank when multi-node environment
dev_per_node – [in] number of accelerator devices per node, when in multi-GPU environment

int rocalution::stop_rocalution(void)¶

Shutdown rocALUTION platform.

stop_rocalution shuts down the rocALUTION platform.

void rocalution::set_device_rocalution(int dev)¶

Set the accelerator device.

set_device_rocalution lets the user select the accelerator device that is supposed to be used for the computation.

Parameters: dev – [in] accelerator device ID for computation

void rocalution::set_omp_threads_rocalution(int nthreads)¶

Set number of OpenMP threads.

The number of threads which rocALUTION will use can be set with set_omp_threads_rocalution or by the global OpenMP environment variable (for Unix-like OS this is OMP_NUM_THREADS). During the initialization phase, the library provides affinity thread-core mapping:

If the number of cores (including SMT cores) is greater or equal than two times the number of threads, then all the threads can occupy every second core ID (e.g. 0, 2, 4, \(\ldots\)). This is to avoid having two threads working on the same physical core, when SMT is enabled.
If the number of threads is less or equal to the number of cores (including SMT), and the previous clause is false, then the threads can occupy every core ID (e.g. 0, 1, 2, 3, \(\ldots\)).
If non of the above criteria is matched, then the default thread-core mapping is used (typically set by the OS).

Note

The thread-core mapping is available only for Unix-like OS.

Note

The user can disable the thread affinity by calling set_omp_affinity_rocalution(), before initializing the library (i.e. before init_rocalution()).

Parameters: nthreads – [in] number of OpenMP threads

void rocalution::set_omp_affinity_rocalution(bool affinity)¶

Enable/disable OpenMP host affinity.

set_omp_affinity_rocalution enables / disables OpenMP host affinity.

Parameters: affinity – [in] boolean to turn on/off OpenMP host affinity

void rocalution::set_omp_threshold_rocalution(int threshold)¶

Set OpenMP threshold size.

Whenever you want to work on a small problem, you might observe that the OpenMP host backend is (slightly) slower than using no OpenMP. This is mainly attributed to the small amount of work, which every thread should perform and the large overhead of forking/joining threads. This can be avoid by the OpenMP threshold size parameter in rocALUTION. The default threshold is set to 10000, which means that all matrices under (and equal) this size will use only one thread (disregarding the number of OpenMP threads set in the system). The threshold can be modified with set_omp_threshold_rocalution.

Parameters: threshold – [in] OpenMP threshold size

void rocalution::info_rocalution(void)¶

Print info about rocALUTION.

info_rocalution prints information about the rocALUTION platform

void rocalution::info_rocalution(const struct Rocalution_Backend_Descriptor &backend_descriptor)¶

Print info about specific rocALUTION backend descriptor.

info_rocalution prints information about the rocALUTION platform of the specific backend descriptor.

Parameters: backend_descriptor – [in] rocALUTION backend descriptor

void rocalution::disable_accelerator_rocalution(bool onoff = true)¶

Disable/Enable the accelerator.

If you want to disable the accelerator (without re-compiling the code), you need to call disable_accelerator_rocalution before init_rocalution().

Parameters: onoff – [in] boolean to turn on/off the accelerator

void rocalution::_rocalution_sync(void)¶

Sync rocALUTION.

_rocalution_sync blocks the host until all active asynchronous transfers are completed (this is a global sync).

Base Rocalution¶

template<typename ValueType> class BaseRocalution : public rocalution::RocalutionObj¶

Base class for all operators and vectors.

Template Parameters: ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::Operator< ValueType >, rocalution::Vector< ValueType >

Public Functions

virtual void MoveToAccelerator(void) = 0¶: Move the object to the accelerator backend.

virtual void MoveToHost(void) = 0¶: Move the object to the host backend.

virtual void MoveToAcceleratorAsync(void)¶: Move the object to the accelerator backend with async move.

virtual void MoveToHostAsync(void)¶: Move the object to the host backend with async move.

virtual void Sync(void)¶: Sync (the async move)

virtual void CloneBackend(const BaseRocalution<ValueType> &src)¶

Clone the Backend descriptor from another object.

With CloneBackend, the backend can be cloned without copying any data. This is especially useful, if several objects should reside on the same backend, but keep their original data.

Example

LocalVector<ValueType> vec;
LocalMatrix<ValueType> mat;

// Allocate and initialize vec and mat
// ...

LocalVector<ValueType> tmp;
// By cloning backend, tmp and vec will have the same backend as mat
tmp.CloneBackend(mat);
vec.CloneBackend(mat);

// The following matrix vector multiplication will be performed on the backend
// selected in mat
mat.Apply(vec, &tmp);

Parameters: src – [in] Object, where the backend should be cloned from.

virtual void Info(void) const = 0¶

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example

mat.Info();
vec.Info();

virtual void Clear(void) = 0¶: Clear (free all data) the object.

Operator¶

template<typename ValueType> class Operator : public rocalution::BaseRocalution<ValueType>¶

Operator class.

The Operator class defines the generic interface for applying an operator (e.g. matrix or stencil) from/to global and local vectors.

Template Parameters: ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::GlobalMatrix< ValueType >, rocalution::LocalMatrix< ValueType >, rocalution::LocalStencil< ValueType >

Public Functions

virtual int64_t GetM(void) const = 0¶: Return the number of rows in the matrix/stencil.

virtual int64_t GetN(void) const = 0¶: Return the number of columns in the matrix/stencil.

virtual int64_t GetNnz(void) const = 0¶: Return the number of non-zeros in the matrix/stencil.

virtual int64_t GetLocalM(void) const¶: Return the number of rows in the local matrix/stencil.

virtual int64_t GetLocalN(void) const¶: Return the number of columns in the local matrix/stencil.

virtual int64_t GetLocalNnz(void) const¶: Return the number of non-zeros in the local matrix/stencil.

virtual int64_t GetGhostM(void) const¶: Return the number of rows in the ghost matrix/stencil.

virtual int64_t GetGhostN(void) const¶: Return the number of columns in the ghost matrix/stencil.

virtual int64_t GetGhostNnz(void) const¶: Return the number of non-zeros in the ghost matrix/stencil.

virtual void Transpose(void)¶: Transpose the operator.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const¶: Apply the operator, out = Operator(in), where in and out are local vectors.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const¶: Apply and add the operator, out += scalar * Operator(in), where in and out are local vectors.

virtual void Apply(const GlobalVector<ValueType> &in, GlobalVector<ValueType> *out) const¶: Apply the operator, out = Operator(in), where in and out are global vectors.

virtual void ApplyAdd(const GlobalVector<ValueType> &in, ValueType scalar, GlobalVector<ValueType> *out) const¶: Apply and add the operator, out += scalar * Operator(in), where in and out are global vectors.

Vector¶

template<typename ValueType> class Vector : public rocalution::BaseRocalution<ValueType>¶

Vector class.

The Vector class defines the generic interface for local and global vectors.

Template Parameters: ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::LocalVector< int >, rocalution::GlobalVector< ValueType >, rocalution::LocalVector< ValueType >

Unnamed Group

virtual void CopyFrom(const LocalVector<ValueType> &src)¶

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters: src – [in] Vector, where values should be copied from.

virtual void CopyFrom(const GlobalVector<ValueType> &src)¶

Copy vector from another vector.

CopyFrom copies values from another vector.

Example

LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters: src – [in] Vector, where values should be copied from.

Unnamed Group

virtual void CloneFrom(const LocalVector<ValueType> &src)¶

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters: src – [in] Vector to clone from.

virtual void CloneFrom(const GlobalVector<ValueType> &src)¶

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example

LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters: src – [in] Vector to clone from.

Public Functions

virtual int64_t GetSize(void) const = 0¶: Return the size of the vector.

virtual int64_t GetLocalSize(void) const¶: Return the size of the local vector.

virtual bool Check(void) const = 0¶

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Return values

true – if the vector is ok (empty vector is also ok).
false – if there is something wrong with the values.

virtual void Clear(void) = 0¶: Clear (free all data) the object.

virtual void Zeros(void) = 0¶: Set all values of the vector to 0.

virtual void Ones(void) = 0¶: Set all values of the vector to 1.

virtual void SetValues(ValueType val) = 0¶: Set all values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1)) = 0¶: Fill the vector with random values from interval [a,b].

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1)) = 0¶: Fill the vector with random values from normal distribution.

virtual void ReadFileASCII(const std::string &filename) = 0¶

Read vector from ASCII file.

Read a vector from ASCII file.

Example

LocalVector<ValueType> vec;
vec.ReadFileASCII("my_vector.dat");

Parameters: filename – [in] name of the file containing the ASCII data.

virtual void WriteFileASCII(const std::string &filename) const = 0¶

Write vector to ASCII file.

Write a vector to ASCII file.

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileASCII("my_vector.dat");

Parameters: filename – [in] name of the file to write the ASCII data to.

virtual void ReadFileBinary(const std::string &filename) = 0¶

Read vector from binary file.

Read a vector from binary file. For details on the format, see WriteFileBinary().

Example

LocalVector<ValueType> vec;
vec.ReadFileBinary("my_vector.bin");

Parameters: filename – [in] name of the file containing the data.

virtual void WriteFileBinary(const std::string &filename) const = 0¶

Write vector to binary file.

Write a vector to binary file.

The binary format contains a header, the rocALUTION version and the vector data as follows

// Header
out << "#rocALUTION binary vector file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// Vector data
out.write((char*)&size, sizeof(int));
out.write((char*)vec_val, size * sizeof(double));

Example

LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileBinary("my_vector.bin");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters: filename – [in] name of the file to write the data to.

virtual void CopyFromAsync(const LocalVector<ValueType> &src)¶: Async copy from another local vector.

virtual void CopyFromFloat(const LocalVector<float> &src)¶: Copy values from another local float vector.

virtual void CopyFromDouble(const LocalVector<double> &src)¶: Copy values from another local double vector.

virtual void CopyFrom(const LocalVector<ValueType> &src, int64_t src_offset, int64_t dst_offset, int64_t size)¶

Copy vector from another vector with offsets and size.

CopyFrom copies values with specific source and destination offsets and sizes from another vector.

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters

src – [in] Vector, where values should be copied from.
src_offset – [in] source offset.
dst_offset – [in] destination offset.
size – [in] number of entries to be copied.

virtual void AddScale(const LocalVector<ValueType> &x, ValueType alpha)¶: Perform vector update of type this = this + alpha * x.

virtual void AddScale(const GlobalVector<ValueType> &x, ValueType alpha)¶: Perform vector update of type this = this + alpha * x.

virtual void ScaleAdd(ValueType alpha, const LocalVector<ValueType> &x)¶: Perform vector update of type this = alpha * this + x.

virtual void ScaleAdd(ValueType alpha, const GlobalVector<ValueType> &x)¶: Perform vector update of type this = alpha * this + x.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta)¶: Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta)¶: Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, int64_t src_offset, int64_t dst_offset, int64_t size)¶: Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, int64_t src_offset, int64_t dst_offset, int64_t size)¶: Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAdd2(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, const LocalVector<ValueType> &y, ValueType gamma)¶: Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void ScaleAdd2(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, const GlobalVector<ValueType> &y, ValueType gamma)¶: Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void Scale(ValueType alpha) = 0¶: Perform vector scaling this = alpha * this.

virtual ValueType Dot(const LocalVector<ValueType> &x) const¶: Compute dot (scalar) product, return this^T y.

virtual ValueType Dot(const GlobalVector<ValueType> &x) const¶: Compute dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const LocalVector<ValueType> &x) const¶: Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const GlobalVector<ValueType> &x) const¶: Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType Norm(void) const = 0¶: Compute \(L_2\) norm of the vector, return = srqt(this^T this)

virtual ValueType Reduce(void) const = 0¶: Reduce the vector.

virtual ValueType InclusiveSum(void) = 0¶: Compute Inclusive sum.

virtual ValueType ExclusiveSum(void) = 0¶: Compute exclusive sum.

virtual ValueType Asum(void) const = 0¶: Compute the sum of absolute values of the vector, return = sum(|this|)

virtual int64_t Amax(ValueType &value) const = 0¶: Compute the absolute max of the vector, return = index(max(|this|))

virtual void PointWiseMult(const LocalVector<ValueType> &x)¶: Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const GlobalVector<ValueType> &x)¶: Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const LocalVector<ValueType> &x, const LocalVector<ValueType> &y)¶: Perform point-wise multiplication (element-wise) of this = x * y.

virtual void PointWiseMult(const GlobalVector<ValueType> &x, const GlobalVector<ValueType> &y)¶: Perform point-wise multiplication (element-wise) of this = x * y.

virtual void Power(double power) = 0¶: Perform power operation to a vector.

Local Matrix¶

template<typename ValueType> class LocalMatrix : public rocalution::Operator<ValueType>¶

LocalMatrix class.

A LocalMatrix is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

A number of matrix formats are supported. These are CSR, BCSR, MCSR, COO, DIA, ELL, HYB, and DENSE.

Note

For CSR type matrices, the column indices must be sorted in increasing order. For COO matrices, the row indices must be sorted in increasing order. The function Check can be used to check whether a matrix contains valid data. For CSR and COO matrices, the function Sort can be used to sort the row or column indices respectively.

Template Parameters: ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Unnamed Group

void AllocateCSR(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol)¶

Allocate a local matrix with name and sizes.