The example continues

Yesterday we have seen how can we scatter a matrix which resides on a core among the processes. Now I want to make the code clearer and encapsulate it into a function. I call the function dscatter and the following are the parameters:

  • const int& context: input, the blacs context
  • const double* const& GlobalMatrix: input, only relevant for the root process; for the other processes it is safe to use a random pointer or NULL
  • double*& LocalMatrix: output, the given pointer is useless; a new one will be stored there; an allocation will be performed, so after the execution the user has to free the memory using delete[].
  • int& GlobalRows: input for root, output for the other processes; after the execution contains the global number of rows
  • int& GlobalCols: input for root, output for the other processes; after the execution contains the global number of columns
  • int& BlockRows: input for root, output for the other processes; after the execution contains the number of rows in each block
  • int& BlockCols: input for root, output for the other processes; after the execution contains the number of columns in each block
  • int& LocalRows: output, after the execution contains the number of rows of the local matrix
  • int& LocalCols: output, after the execution contains the number of columns of the local matrix
  • const int& root: input, the BLACS id of the matrix that owns the global matrix.

An example of BLACS with C++

I’m shocked by the lack of examples or guides on the web regarding BLACS, PBLAS and ScaLAPACK. Therefore, I decided to post here some examples. Many people (and I among them) use the C or C++ language instead of Fortran and therefore need a way for accessing the Fortran routines from there — not so difficult: the function arguments are always pointers, the function names are usually in lower case and with a trialing underscore — or have a more comfortable C interface.

In this example I will load a matrix from a file into the root process, scatter it among the processes according to the block-cyclic pattern, print the local matrices, then gather the local matrices onto the root process and control that the original matrix and the gathered matrix are the same. I will use for that MPI, BLACS with its C interface and some helping routine from ScaLAPACK (just numroc).

Before I begin with the example, two remarks:

  • The example is useful for explanatory purposes, but is in facts not so useful in the real-life work: most often we use distributed memory programming because the data would not fit into a single computer memory. For instance, if we want to make a computation that involves a 500,000-by-500,000 matrix of doubles, this would sum up to 250,000,000,000 doubles, which results in 2,000,000,000,000 bytes = 2 TB. There is no single computer with such a memory (as far as I know…). Therefore it makes no sense to load the whole matrix into the root process. But we will most probably test tiny matrices, so forget this for now.
  • If you landed here and decide to continue reading, PLEASE, let a message in the comments. Critics, suggestions, comments are very welcome.

You can find the official documentation of BLACS on netlib:

It is not always very clear, but once you made yourself familiar with BLACS, it is a good quick reference.

You will find more examples in the Parallel Computiung category on this blog. Stay tuned, if you are interested!

Follow

Get every new post delivered to your Inbox.