Benchmarking suite – Final Report

This is the final report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the Google Summer of Code 2011.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the same implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 12

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 8 – 15 August.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Testing numbench

This article explains how to install the numbench script and to run the benchmarks.

Install app-benchmarks/numbench

First of all, the science overlay has to be added through layman; therefore, install layman on your Gentoo system if you don’t have it already, and do:

layman -L
layman -a science

After that you will be able to install the package app-benchmarks/numbench. Remember that numbench is still unstable and therefore I will need to install the ~x86/~amd64 version.

The package installs the executable “numbench”, some Python data and a man page numbench(1).

I recommend to add the bicatali overlay through layman too, because it contains many numerical libraries that can be benchmarked, even if we are migrating them into the science overlay.

Run the benchmarks

In order to run the benchmarks you have to provide a configuration file. The man page explains how to write one, and you will find some examples under /usr/share/numbench/samples. Once you have your configuration file (say conf.in), and you have chosen the module to test (e.g. blas, lapack or lapack_accuracy; see man numbench or numbench -h), just run the command

numench module conf.in -s

The documentation explains how to run the test with more parameters in order to choose the tests that have to be performed.

After the execution you will find interesting directories under ~/.benchmarks:

  • log contains the log, obviously; they are divided in subfolders in case of multiple runs
  • packages contains the packages that are useful if you decide to install some tested one (the documentation here lacks)
  • reports contains for each run a set of images, an HTML page and a copy of the logs; they are ready to be published somewhere, just copy the whole folder in your www directory
  • roots and tests are two directories which are used by the script for storing data; they are keeped in order not tu run the tests again if the results already exist
Please let me know if you find any bugs!

Benchmarking suite – Report 11

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 1 – 7 August.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 10

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 25 July-1 August.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 9

This is the report of the project “Automated benchmark suite for numerical libraries in Gentoo” for the week 18-24 July.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 8

A very short log of what has been done during the last days. I am
working on the PBLAS and ScaLAPACK benchmarks, which is a very
challenging topic, because it is very difficult to debug such
applications.

* I changed some parts of the BTL framework, adapting it to the
distributed memory benchmarks. This has required writing two new
perfanalyzers — one for the root process, one for the other (node)
processes. The nodes do not perform any measurement, while the root
process broadcasts the needed informations, measures the time and
manages the output (both std{out,err} and resulting file).

* I added a BLACS library that provides an useful interface which
scatters and gathers matrices and vectors. I also added a PBLAS
library that inherits the BLACS one and will support the most common
operations (at the moment just the parallel matrix-vector
multiplication).

* I added an action for the parallel matrix-vector multiplication
which makes use of the two described interfaces.

The matrix-vector multiplication is a case study for now. If
everything goes fine (and it seems so, now), then more actions will be
provided, for both PBLAS and ScaLAPACL, which share the same concepts.
I plan to have tomorrow a working (but incomplete) Python module for
PBLAS, too.

Milestones for the next week:
* Having working PBLAS and ScaLAPACK modules
* Do some benchmarks using these modules and publish the results
* Start the implementation of the advanced FFTW benchmarks, as
previously described

Best regards
Andrea Arteaga

Mid-term report

This report presents the status of the Google Summer of Code project Automated benchmark suite for numerical libraries in Gentoo before the mid-term evaluation.

Project description

The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags. Read more of this post

Benchmarking suite – Report 7

Hello all.
This week the automated benchmarking suite project has received much
work. During the first half of the week I

* improved the HTML report
* closed a number of bugs
* implemented a module for testing FFTW

Then, I spent a couple of days investigating the distributed-memory
parallel routines of BLACS, PBLAS and ScaLAPACK, and their
availability in Gentoo. I also set up an MPI environment under Gentoo
and decided to share my experience within the documentation of Gentoo,
which is slightly outdated.

Now, some work is being done in the pkg-config interface of the
script. Pkg-config is used by the script to retrieve the compile
instructions when compiling and linking against the numerical
libraries. The new alternatives-2 system relies on this useful tool,
and the benchmarking project also does. Some refinement will allow the
user to customize even more the tests.

Implementing a module for testing FFTW
========================================

This field is a bit special, as FFTW is not a standard which is
reimplemented in optimized libraries (which is the case of BLAS,
CBLAS, LAPACK,…). However, it is useful to compare the speed of FFTW
compiled with different compilers, versions or flags, so there is a
particular module for testing this widely used library.

FFTW can perform a wide number of tasks, and a survey has been taken
[1] asking the community about the most wanted tests. Following the
results, the following tests have been implemented:
* Forward Discrete Fourier Transormation
* Backward Discrete Fourier Transormation
* The same two with the flag FFTW_MEASURE instead of FFTW_ESTIMATE

After the mid-term evaluation also 2-dimensional tests will be
provided. Now this four tests are available and implemented within the
BTL framework, which makes the benchmarking suite robust. The tests
until now have been successful.

Next week’s milestones
========================

The mid-term evaluation will come soon. For this, I’m performing many
tests in order to present, as mid-term result, a general comprehensive
report which makes clear that the project is already usable. My mentor
reported that my script already allowed him to find some bugs in a
package, which already is a good results for me — even if the purpose
of the script is not (only) finding bugs –. My mid-term report will
hopefully persuade many people to run some benchmarks on their
computers.

So, objectives for the next week are:

* Improve the pkg-config interface
* Improve the user configuration possibilities
* Improve the niceness and information of the HTML reports
* Test the suite as much as possible in order to generate a
comprehensive set of reports for each supported library (BLAS, CBLAS,
LAPACK, FFTW)

Best regards
Andrea Arteaga

[1] https://spreadsheets.google.com/spreadsheet/viewform?hl=en_US&formkey=dGlKbjBfLW5uaGt4QnQxNFJYUVp1QlE6MQ#gid=0

Gentoo automated bechmarks

This document explains how to run it and refers to the git repository status as of 4 july 2011.

Retrieval

You can install the project by just cloning the git repository: git clone git://git.overlays.gentoo.org/proj/auto-numerical-bench. You will also need:

  • A python interpreter at version 2.6 or 2.7;
  • The portage and gentoolkit package;
  • Only the new style eselect with alternatives is supported, so you will need the bicatali ovelay;
  • For the graphical reports you will also need matplotlib (compiled with libpng) and numpy;
  • The packages can be tested only if they already have all the dependencies installed; so, if you want to test eigen, for instance, you will need to install all the dependencies before.
Once you have cloned the repository, enter into the directory app-benchmarks/autobench/files/python (or the name you have chosen) in order to run it.
Another option is to use layman to install the repository as overlay, then emerge autobench-9999 (which is ~x86, ~amd64), but won’t explain this.

Execution

The script generates binary packages running portage with a special environment. Then emerges the packages individually in a directory, compiles a standard benchmarking program, runs it and collects the results. A single package can be emerged many times with different compiler flags or compilers. For example, one could test sci-libs/atlas-3.9.41 3 times:

  • Using gfortran-4.5.2 with FFLAGS=-O3
  • Using gfortran-4.6.0 with FFLAGS=”-O2 fschedule-insns”
  • Using ifort (whatever version) with standard FFLAGS
In this case one has to provide a configuration file formatted as follows:

atlas-gcc-452 sci-libs/atlas-3.9.41 FC=gfortran-4.5.2 FFLAGS=-O3
atlas-gcc-460 sci-libs/atlas-3.9.41 FC=gfortran-4.6.0 FFLAGS="-O2 fschedule-insns"
atlas-icc sci-libs/atlas-3.9.41 FC=ifort

Each row defines a configuration and is formatted as follows:
  • The first part is a string of alphanumeric characters that identifies the configuration.
  • The second part is the package to test; in the example it is fully qualified through category/package-version, but it is not mandatory, although it is the best procedure. In case of ambiguity (e.g. more installable versions) every package is installed and tested separately. For example, sci-libs/atlas would test both sci-libs/atlas-3.8.4 and sci-libs/atlas-3.9.41
  • Everything after the package is is the environment to use while emerging the package.
The configuration file can be stored everywhere. Now, we come to the script.
The script is the main.py file in the directory. It has to be called with the following syntax:

python2 main.py [library] [conffile] [tests]

Where:
  • [library] can be
    • blas – currently supported
    • cblas - currently supported
    • lapack – currently supported
    • lapacke
    • scalapack
    • blacs
  • [conffile] is the described configuration file
  • [tests] is a list of tests to be performed during the benchmark. For blas and cblas the following tests are available:
    • Level 1:
      • axpy – standard
      • axpby
      • rot
    • Level 2:
      • matrix_vector – standard
      • atv
      • symv
      • syr2
      • ger
      • trisolve_vector - standard
    • Level 3:
      • matrix_matrix – standard
      • aat
      • trisolve_matrix
      • trmm
  • For lapack the following are available:
    • general_solve: solves a general quadratic linear system of equations
    • lu_decomp: computes the full-pivoting LE decomposition of a general quadratic matrix
    • least_squares: solves the least squares problem (for the benchmarks a quadratic matrix is considered)
    • cholesky: computes the cholesky decomposition of a SPD matrix
    • symm_ev: computes the eigenvalues of a symmetric matrix
The standard tests are performed if no args are provided. For lapack, all tests are standards.

Directories

The script can be runned as standard or super user. In the first case the packages will be stored into ~/.benchmarks/packages; in the latter, the packages are stored into /var/cache/benchmarks/packages. In both cases, the tests are runned into /var/tmp/benchmarks/roots and the temporary results stored into /var/tmp/benchmarks/tests.

Log files are stored within /var/log/benchmarks. There you will find a directory for every time you runned the script. Almost everything is logged.

The results are stored within /var/cache/benchmarks/results if the user is root and within ~/.benchmarks/results otherwise. If the switch -s is given also a summary plot is generated. If the switch -S is given, only the summay plot is generated. The results include a plot as PNG image for every operation, the summary image if required and an HTML page with all the plots.

Follow

Get every new post delivered to your Inbox.