The example continues

Yesterday we have seen how can we scatter a matrix which resides on a core among the processes. Now I want to make the code clearer and encapsulate it into a function. I call the function dscatter and the following are the parameters:

  • const int& context: input, the blacs context
  • const double* const& GlobalMatrix: input, only relevant for the root process; for the other processes it is safe to use a random pointer or NULL
  • double*& LocalMatrix: output, the given pointer is useless; a new one will be stored there; an allocation will be performed, so after the execution the user has to free the memory using delete[].
  • int& GlobalRows: input for root, output for the other processes; after the execution contains the global number of rows
  • int& GlobalCols: input for root, output for the other processes; after the execution contains the global number of columns
  • int& BlockRows: input for root, output for the other processes; after the execution contains the number of rows in each block
  • int& BlockCols: input for root, output for the other processes; after the execution contains the number of columns in each block
  • int& LocalRows: output, after the execution contains the number of rows of the local matrix
  • int& LocalCols: output, after the execution contains the number of columns of the local matrix
  • const int& root: input, the BLACS id of the matrix that owns the global matrix.
Advertisements

An example of BLACS with C++

I’m shocked by the lack of examples or guides on the web regarding BLACS, PBLAS and ScaLAPACK. Therefore, I decided to post here some examples. Many people (and I among them) use the C or C++ language instead of Fortran and therefore need a way for accessing the Fortran routines from there — not so difficult: the function arguments are always pointers, the function names are usually in lower case and with a trialing underscore — or have a more comfortable C interface.

In this example I will load a matrix from a file into the root process, scatter it among the processes according to the block-cyclic pattern, print the local matrices, then gather the local matrices onto the root process and control that the original matrix and the gathered matrix are the same. I will use for that MPI, BLACS with its C interface and some helping routine from ScaLAPACK (just numroc).

Before I begin with the example, two remarks:

  • The example is useful for explanatory purposes, but is in facts not so useful in the real-life work: most often we use distributed memory programming because the data would not fit into a single computer memory. For instance, if we want to make a computation that involves a 500,000-by-500,000 matrix of doubles, this would sum up to 250,000,000,000 doubles, which results in 2,000,000,000,000 bytes = 2 TB. There is no single computer with such a memory (as far as I know…). Therefore it makes no sense to load the whole matrix into the root process. But we will most probably test tiny matrices, so forget this for now.
  • If you landed here and decide to continue reading, PLEASE, let a message in the comments. Critics, suggestions, comments are very welcome.

You can find the official documentation of BLACS on netlib:

It is not always very clear, but once you made yourself familiar with BLACS, it is a good quick reference.

You will find more examples in the Parallel Computiung category on this blog. Stay tuned, if you are interested!