Benchmarking suite – Report 11

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 1 – 7 August.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Advertisements

Benchmarking suite – Report 10

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 25 July-1 August.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 9

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 18-24 July.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 8

A very short log of what has been done during the last days. I am
working on the PBLAS and ScaLAPACK benchmarks, which is a very
challenging topic, because it is very difficult to debug such
applications.

* I changed some parts of the BTL framework, adapting it to the
distributed memory benchmarks. This has required writing two new
perfanalyzers — one for the root process, one for the other (node)
processes. The nodes do not perform any measurement, while the root
process broadcasts the needed informations, measures the time and
manages the output (both std{out,err} and resulting file).

* I added a BLACS library that provides an useful interface which
scatters and gathers matrices and vectors. I also added a PBLAS
library that inherits the BLACS one and will support the most common
operations (at the moment just the parallel matrix-vector
multiplication).

* I added an action for the parallel matrix-vector multiplication
which makes use of the two described interfaces.

The matrix-vector multiplication is a case study for now. If
everything goes fine (and it seems so, now), then more actions will be
provided, for both PBLAS and ScaLAPACL, which share the same concepts.
I plan to have tomorrow a working (but incomplete) Python module for
PBLAS, too.

Milestones for the next week:
* Having working PBLAS and ScaLAPACK modules
* Do some benchmarks using these modules and publish the results
* Start the implementation of the advanced FFTW benchmarks, as
previously described

Best regards
Andrea Arteaga

Mid-term report

This report presents the status of the Google Summer of Code project Automated benchmark suite for numerical libraries in Gentoo before the mid-term evaluation.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 7

Hello all.
This week the automated benchmarking suite project has received much
work. During the first half of the week I

* improved the HTML report
* closed a number of bugs
* implemented a module for testing FFTW

Then, I spent a couple of days investigating the distributed-memory
parallel routines of BLACS, PBLAS and ScaLAPACK, and their
availability in Gentoo. I also set up an MPI environment under Gentoo
and decided to share my experience within the documentation of Gentoo,
which is slightly outdated.

Now, some work is being done in the pkg-config interface of the
script. Pkg-config is used by the script to retrieve the compile
instructions when compiling and linking against the numerical
libraries. The new alternatives-2 system relies on this useful tool,
and the benchmarking project also does. Some refinement will allow the
user to customize even more the tests.

Implementing a module for testing FFTW
========================================

This field is a bit special, as FFTW is not a standard which is
reimplemented in optimized libraries (which is the case of BLAS,
CBLAS, LAPACK,…). However, it is useful to compare the speed of FFTW
compiled with different compilers, versions or flags, so there is a
particular module for testing this widely used library.

FFTW can perform a wide number of tasks, and a survey has been taken
[1] asking the community about the most wanted tests. Following the
results, the following tests have been implemented:
* Forward Discrete Fourier Transormation
* Backward Discrete Fourier Transormation
* The same two with the flag FFTW_MEASURE instead of FFTW_ESTIMATE

After the mid-term evaluation also 2-dimensional tests will be
provided. Now this four tests are available and implemented within the
BTL framework, which makes the benchmarking suite robust. The tests
until now have been successful.

Next week’s milestones
========================

The mid-term evaluation will come soon. For this, I’m performing many
tests in order to present, as mid-term result, a general comprehensive
report which makes clear that the project is already usable. My mentor
reported that my script already allowed him to find some bugs in a
package, which already is a good results for me — even if the purpose
of the script is not (only) finding bugs –. My mid-term report will
hopefully persuade many people to run some benchmarks on their
computers.

So, objectives for the next week are:

* Improve the pkg-config interface
* Improve the user configuration possibilities
* Improve the niceness and information of the HTML reports
* Test the suite as much as possible in order to generate a
comprehensive set of reports for each supported library (BLAS, CBLAS,
LAPACK, FFTW)

Best regards
Andrea Arteaga

[1] https://spreadsheets.google.com/spreadsheet/viewform?hl=en_US&formkey=dGlKbjBfLW5uaGt4QnQxNFJYUVp1QlE6MQ#gid=0

The example continues

Yesterday we have seen how can we scatter a matrix which resides on a core among the processes. Now I want to make the code clearer and encapsulate it into a function. I call the function dscatter and the following are the parameters:

  • const int& context: input, the blacs context
  • const double* const& GlobalMatrix: input, only relevant for the root process; for the other processes it is safe to use a random pointer or NULL
  • double*& LocalMatrix: output, the given pointer is useless; a new one will be stored there; an allocation will be performed, so after the execution the user has to free the memory using delete[].
  • int& GlobalRows: input for root, output for the other processes; after the execution contains the global number of rows
  • int& GlobalCols: input for root, output for the other processes; after the execution contains the global number of columns
  • int& BlockRows: input for root, output for the other processes; after the execution contains the number of rows in each block
  • int& BlockCols: input for root, output for the other processes; after the execution contains the number of columns in each block
  • int& LocalRows: output, after the execution contains the number of rows of the local matrix
  • int& LocalCols: output, after the execution contains the number of columns of the local matrix
  • const int& root: input, the BLACS id of the matrix that owns the global matrix.