<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Spiros</title>
	<atom:link href="http://andyspiros.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://andyspiros.wordpress.com</link>
	<description>Scientific computing and...</description>
	<lastBuildDate>Wed, 12 Oct 2011 16:14:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='andyspiros.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/f5c2ced9f99b12bc6ba8b2e385f5ce1d?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Spiros</title>
		<link>http://andyspiros.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://andyspiros.wordpress.com/osd.xml" title="Spiros" />
	<atom:link rel='hub' href='http://andyspiros.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Playing with C++0x</title>
		<link>http://andyspiros.wordpress.com/2011/09/09/playing-with-c0x/</link>
		<comments>http://andyspiros.wordpress.com/2011/09/09/playing-with-c0x/#comments</comments>
		<pubDate>Fri, 09 Sep 2011 09:45:12 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Programmazione]]></category>
		<category><![CDATA[c++0x]]></category>
		<category><![CDATA[gcc]]></category>
		<category><![CDATA[tuples]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=607</guid>
		<description><![CDATA[The new C++0x standard (sometimes called C++11) provides new interesting features, like variadic templates and tuples. So, I&#8217;m experimenting a bit in order to test the technical requirements to put in practise the proposal for a new BTL, and I hit a problem of the GCC (I guess). Assume we have a set of Interfaces [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=607&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The new <a href="http://en.wikipedia.org/wiki/C%2B%2B0x">C++0x standard</a> (sometimes called C++11) provides new interesting features, like <strong><a href="http://en.wikipedia.org/wiki/C%2B%2B0x#Variadic_templates">variadic templates</a></strong> and <strong><a href="http://en.wikipedia.org/wiki/C%2B%2B0x#Tuple_types">tuples</a></strong>. So, I&#8217;m experimenting a bit in order to test the technical requirements to put in practise the <a title="Proposal for a new version of the BTL" href="http://andyspiros.wordpress.com/2011/09/01/for-a-new-version-of-the-btl/">proposal for a new BTL</a>, and I hit a problem of the GCC (I guess).<span id="more-607"></span></p>
<p>Assume we have a set of <em>Interfaces</em> &#8212; I1 and I2 &#8212; that implement a method name(), which prints the name of the interface. And we have a set of template classes <em>Actions</em> &#8212; A1 and A2 &#8212; which take as template parameter an <em>Interface</em> &#8212; a class &#8212; and implement the method print() which prints the name of the action followed by the name of the interface. The code for this is the following (given that iostream is #included and cout exported in the main namespace):</p>
<p><pre class="brush: cpp;">
template&lt;typename Interface&gt;
struct A1 {
	void print() {
		cout &lt;&lt; &quot;A1 - &quot;;
		Interface i;
		i.print();
	}
};

template&lt;typename Interface&gt;
struct A2 {
	void print() {
		cout &lt;&lt; &quot;A2 - &quot;;
		Interface i;
		i.print();
	}
};

struct I1 {
	void print() {
		cout &lt;&lt; &quot;I1\n&quot;;
	}
};
struct I2 {
	void print() {
		cout &lt;&lt; &quot;I2\n&quot;;
	}
};
</pre></p>
<p>Now we can do something like &#8220;A1&lt;I2&gt; action; action.print();&#8221; somewhere and have as result &#8220;A1 &#8211; I2&#8243;. And we can also write a function that takes an action and a set of interfaces (variadic template) and applies the same action on all these interfaces. The following code demonstrates this, by making use of a template tempalte parameter:</p>
<p><pre class="brush: cpp;">
template&lt;template&lt;typename I&gt; class Action&gt;
void g1() {
	cout &lt;&lt; &quot;End of iteration\n&quot;;
}

template&lt;template&lt;typename I&gt; class Action, class Interface1, class ... Interfaces&gt;
void g1() {
	Action&lt;Interface1&gt; action;
	action.print();
	g1&lt;Action, Interfaces...&gt;();
}

template&lt;template&lt;typename I&gt; class Action, class ... Interfaces&gt;
void f1() {
	cout &lt;&lt; &quot;Begin iteration\n&quot;;
	g1&lt;Action, Interfaces...&gt;();
}
int main()
{
	f1&lt;A1, I1, I2&gt;();
}
</pre></p>
<p>The result is the desired:</p>
<pre>Begin iteration
A1 - I1
A1 - I2
End of iteration</pre>
<p>But now we want to do the opposite: given a single interface, apply many actions on it (at the end, what we want is to give many actions and many interfaces, but let&#8217;s proceed with small steps). In this case, we need to mix variadic templates and template template parameters, in the following way:</p>
<p><pre class="brush: cpp;">
template&lt;class Interface&gt;
void g2() {
	cout &lt;&lt; &quot;End of iteration\n&quot;;
}

template&lt;class Interface, template&lt;class I&gt; class Action1, template&lt;class I&gt; class ... Actions&gt;
void g2() {
	Action1&lt;Interface&gt; action;
	action.print();
	g2&lt;Interface, Actions...&gt;();
}

template&lt;class Interface, template&lt;class I&gt; class ... Actions&gt;
void f2() {
	cout &lt;&lt; &quot;Begin iteration\n&quot;;
	g2&lt;Interface, Actions...&gt;();
}
</pre></p>
<p>This is just the same example transposed to single Interface &#8211; multiple Actions, and in theory it should work by just calling e.g. f2&lt;I1, A1, A2&gt;(); in the main function. But this will raise an error  when compiled with gcc:</p>
<pre>variadic_template_template.cpp: In function ‘void f2() [with Interface = I1, Actions = A1, A2]’:
variadic_template_template.cpp:76:17:   instantiated from here
variadic_template_template.cpp:66:55: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See &lt;http://bugzilla.redhat.com/bugzilla&gt; for instructions.
Preprocessed source stored into /tmp/ccaaQrEG.out file, please attach this to your bugreport.</pre>
<p>The error message can change from platform to platform, but I tested it on Fedora, Gentoo, Debian and other, with many different versions of gcc and I always had this error. So I actually posted a bug report <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50303">on the gcc bugzilla</a>. There you can find some more technical details.</p>
<p>Can somebody test this with some other compiler? My icc refuses to work and I have no time right now to test the same with path64 or other compilers.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/607/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/607/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/607/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=607&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/09/09/playing-with-c0x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Proposal for a new version of the BTL</title>
		<link>http://andyspiros.wordpress.com/2011/09/01/for-a-new-version-of-the-btl/</link>
		<comments>http://andyspiros.wordpress.com/2011/09/01/for-a-new-version-of-the-btl/#comments</comments>
		<pubDate>Thu, 01 Sep 2011 12:38:19 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Programmazione]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[btl]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=567</guid>
		<description><![CDATA[During the last months I have performed a job around the benchmarking tasks and I have gained a good knowledge of the Bench Template Library (BTL), which is a very good, generic, extensible, acccurate benchmarking library. However, I also found some problems that it has and seeked a solutions, at least ideally. In this document [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=567&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>During the last months I have performed a job around the benchmarking tasks and I have gained a good knowledge of the Bench Template Library (BTL), which is a very good, generic, extensible, acccurate benchmarking library. However, I also found some problems that it has and seeked a solutions, at least ideally. In this document I will explain these and present a project for renewing the BTL.</p>
<p>When I cite the BTL in this document, I refer to the current version in the eigen mercurial repository.</p>
<h3>Foundations of the BTL</h3>
<p>The BTL is based upon <em>interfaces</em> to many numerical libraries. These are classes that wrap the calls to the library functions. The idea is the same as for the abstract classes-inheritance paradigm, but in this case C++ templates are used and the wrapper methods are usually inlined, which makes this system being very reliable for the benchmarking purposes.</p>
<p>Besides, a set of <em>actions</em> is defined. An <em>action</em> is a class that acts on an <em>interface</em> for performing a particular numerical job with a specific size. For example, there is an <em>action</em> for the <em>axpy</em> operation, one for the triangular system solver, one for the matrix-vector product. Each action has the following public methods:</p>
<ul>
<li>A constructor which takes the problem size as integer; this creates the needed generic matrices and vectors, and copies them to the <em>interface</em>-specific objects.</li>
<li>An invalidated copy-constructor.</li>
<li>A destructor which frees the memory.</li>
<li><em>name</em> which returns the name of the action as std::string (along with the <em>interface</em> name).</li>
<li><em>nb_op_base</em> which returns the number of base (floating point) operations performed by the action.</li>
<li><em>initialize</em> which performs some preliminary task before computing the result. This is in theory useful for libraries (e.g. BLAS) that do in-place computations; then this method regenerates the input data.</li>
<li><em>calculate</em> which performs the actual computation.</li>
<li><em>check_result</em> which implements a basic check for the output of the computation; in case of failure exits the program with an error message and code.</li>
</ul>
<p>All <em>actions</em> take as class template parameter an <em>interface</em>.</p>
<p>The real BTL code resides in the <em>generic_bench</em> directory, where a set of utilities and other generic functions are defined. The utilities allow the <em>actions</em> to construct random matrices, vectors. The function <em>bench</em> and the <em>PerfAnalyzer</em>classes are the real core of the BTL and perform a great job.</p>
<p>Now, for each library that has to benchmarked a program has to be compiled, linked and executed. The program is based upon main.cpp files that are delivered along with the <em>interface</em> – the main.cpp is slightly <em>interface</em>-dependant – and has compile instructions by means of the CMake tool.</p>
<h3>Drawbacks of the BTL</h3>
<p>I will now point out some problems that I found on the BTL during my work:</p>
<ul>
<li><strong>Input data</strong>. For the benchmarks to be unbiased, all interfaces have to be in the same situation when doing a job; this includes the input matrices and vectors. Basic linear algebra algorithms like matrix-vector multiplication or vector sum usually do exactly the same job on any input data; but more complex algorithms, like sparse solvers, have a run time and an amount of floating point operations that strongly depends on the input data. For example, the Conjugate Gradient Method could need a strongly different amount of iteration when applied to two equally-sized slightly different matrices and starting vectors. This is the reason because a different, more evolved way to construct random matrices has to be considered. We will do this by means of a deterministic random number generator. In other words, the input matrices construction has to be reproducible.</li>
<li><strong>Results representation</strong>. In the current version, the benchmarking results are represented by means of megaflops (milions of floating point operations per second). This is a common way to represent results, useful and handy, but sometimes wrong. One of the objectives of the BTL, at least in my opinion, is to be generic. The design of the library, the implementation of the core functions is actually a very good starting point for a really generic benchmarking library; in my project I could add FFTW, distributed-memory benchmarks by just adding interfaces, actions and with few changes to the core library. The action-specific number of base operations is not very generic, in facts. Two different libraries can use different algorithms to solve the same problem, resulting in a different amount of floating point operations; the number of operation could depend on the input data (e.g. for sparse algorithm this s often the case); it could strongly depend on the size of problem, as for the FFT, where the algorithms can be more or less optimized depending on whether the size is a product of small prime numbers or not. Each of these (quite common for non-basic tasks) lead to situations where the action class can not correctly determine the number of floating point operations involved in the calculation, resulting in benchmarking outputs that are useful to compare different libraries, but basically wrong. Some consideration will be done to solve this issue.</li>
<li><strong>The Bench Template Library is not a library</strong>. This is not a real problem, but some meditation can be done in this field, though. A library is defined as a set of classes and functions (and objects, macros and other entities) that are used in other programs. The BTL is actually a library <strong>and</strong> a set of program sources <strong>and</strong> a set of compile an run instructions. It could be possible to design a real library with some function that would perform the benchmarks of different libraries and provide the results by means of a predefined interface (directly to file or as return value of a function); this library could be used in a program independently of the compile instructions. This lead to some problems, some of them are solvable and are treated below.</li>
<li><strong>Mixed use of the STL</strong>. The BTL makes sometimes use of the STL objects, and relies some other time on new/delete operations. This is not a problem for the current version of BTL, but make things slightly more difficult for the reader or the programmer that wants to extend the BTL, for instance adding new actions. My proposal is to make a more extensive use of the STL, in particular of the vector template class, in order to simplify some operations and definitively avoid memory leaks. The new C++0x syntax (and many standard C++98 compilers with some advanced optimization) also avoid copy-construction of vectors when not needed – move construction –, resulting in a easier, less error-prone and equally performing and memory using programming paradigm.</li>
<li><strong>Initialize</strong>.<strong><em> </em></strong>Some attention should be payed regarding the <em>initialize</em> and <em>calculate</em> methods of the actions. The <em>initialize</em> method should prepare the input structures for the <em>calculate</em> operation. This means that <em>initialize</em> should be always called before <em>calculate</em>. This is not the case, at least in the current version: <em>initialize</em> is called, then <em>calculate</em> is called many times (such that the measured time is at least 0.2 seconds). In case of an in-place operation, this is a problem. Consider the cholesky decomposition: a symmetric, positive definite matrix A is constructed and a copy of it is called A_ref; the calculate method calls the library function that reads A and writes the resulting L matrix into the same memory location of A; this leads to the situation where A is not an SPD matrix anymore: A_ref should be copied into A and the computation could be then performed again. Now two – interconnected – reflections have to be done. Firstly: the time required by <em>initialize</em> should probably not be taken into account for the benchmarking process; this would require that the timer just measures one execution of <em>calculate</em>, then stops, avoiding the benefits of the current measurement process, which lets the timer measure the time required by many runs; as result, the benchmark could not be accurate for small sizes. Secondly: not-in-place implementations do not present such problems, probably because they internally perform similar copy operations; therefore some attention has to be taken when considering the difference between in-place and not-in-place implementations. Lastly, some strict rules have to be defined in order to be completely unambiguous with respect to the content of both functions (which operations are allowed to be placed in <em>initialize</em> and not be benchmarked, which are restricted to <em>calculate</em>).</li>
<li><strong>Checking the result</strong>. As said, the action itself checks the results and, in case of failure, exits the program. Well, this sound a naive behavior: in case of an error an exception should be thrown, or, in case we want to avoid the usage of C++ exceptions – so think I –, a <em>false</em> value should be returned by the check method and the BTL itself (not each action separately) should take the decision on what to do, how to notice the error to the user and how to continue the benchmarks.</li>
<li><strong>More flexibility</strong>. As improvement of the BTL, some other usage cases could be considered. As stated before, I adapted the BTL to also support distributed-memory libraries (ScaLAPACK). In order to do things well, some more internal change could be considered, in order to make the BTL even more generic.</li>
</ul>
<p>In the following I will propose some possible solutions for the problems that I pointed out. Let&#8217;s start with the more challenging points.</p>
<h3>How to benchmark different interfaces together</h3>
<p>In the current version of the BTL, the bench function takes as template parameter an <em>action</em>, which in turn takes as template parameter an <em>interface</em>. Instead of doing so, we could imagine a new bench function, which makes use of variadic template arguments and tuples – introduced in the C++0x standard, which is now supported in many C++ compilers: the first set of arguments (the first tuple, but this definition would not be exact; better: the first tuple specialization) would describe the interfaces, while the second one the actions.</p>
<p><pre class="brush: cpp;">
typedef std::tuple &lt;
  eigen3_interface,
  blitz_interface,
  ublas_interface
&gt; interfaces_tuple;

typedef std::tuple &lt;
  action_axpy,
  action_matrix_vector_product,
  action_cholesky_decomposition
&gt; actions_tuple;

int main()
{
  bench&lt;interfaces_tuple, actions_tuple&gt;(/* arguments of bench */);
}
</pre></p>
<p>Then, bench would iterate over actions, then over interfaces and generate all possible combinations. Benchmarking all the interfaces together with respect to a specific action would solve some problems; for instance, this would aid to avoid too long benchmark run times, keeping the accuracy of the results as high as possible.</p>
<p>Now, there is a problem with libraries that have the same interface. The most important example is BLAS: BLAS is a stadardized interface with a reference implementation written in Fortran and other optimized implementation written in Fortran, C or C++ with the same interface; depending on the library the executable is linked with, the desired implementation is chosen, and with this method it is impossible to run two different BLAS implementations in the same execution. There is a solution: load the desired library at run time. Consider the following class:</p>
<p><pre class="brush: cpp;">
class BLAS_interface
{
public:
  BLAS_interface(const std::string&amp; library) {
    void *handle = dlopen(library.c_str(), RTLD_NOW);
    if (!handle)
    	std::cerr &lt;&lt; &quot;Failed loading &quot; &lt;&lt; library &lt;&lt; &quot;\n&quot;;

    axpy_func = reinterpret_cast(dlsym(handle, &quot;daxpy_&quot;));
    char *error;
    if ((error = dlerror()) != NULL)
    	std::cerr &lt;&lt; error &lt;&lt; &quot;\n&quot;;
  }

  // C++-friendly axpy interface : y   void axpy(
    const double&amp; alpha,
    const std::vector&amp; x,
    std::vector&amp; y
  ) {
  	const int N = x.size();
  	const int iONE = 1;
  	axpy_func(&amp;N, &amp;alpha, &amp;x[0], &amp;iONE, &amp;y[0], &amp;iONE);
  }

private:
  typedef void (*axpy_t)(const int*, const double*, const double*, const int*, double*, const int*);
  axpy_t axpy_func;
};
</pre></p>
<p>This solves the problem of having different libraries with the same interface. But a new issue arise: until now the interfaces did not need to be instantiated: the methods were static and no property was saved; with this version one has to instantiate an object of the BLAS_interface class and use that object to call the functions. Therefore, a template parameter is not sufficient anymore to define an interface, but an actual object is needed. This is not a problem, but we have to clearly state that the interfaces are not anymore <em>static classes</em>, but have to be instantiated. Nothing forbids them to be signleton classes, which would actually be a sensible design pattern to implement for the interfaces that do not need an external linked library, or that only have one possible external library.</p>
<p>There is still a point that could be taken into account: what about interfaces that load an external library that in turn load another external library? This seems an exotic situation, but in facts it is a common one: usually LAPACK implementation rely on a BLAS implementation for the basic operations. Is it possible to benchmark separately the LAPACK reference implementation when using the openblas BLAS implementation and the same LAPACK implementation when using the ATLAS BLAS implementation? Yes, it is possible, but requires some changes to the simple framework that I just presented.</p>
<p>When opening an external shared library, one can add the option RTLD_GLOBAL, which means that the symbols are made available to all successively opened shared libraries. By the way, doing so with a BLAS library is necessary to open a LAPACK library. This is where we can choose the BLAS implementation to use with a specific LAPACK implementation. The only problem is that we have to close both BLAS implementation and LAPACK implementation shared object files if we want to chage BLAS implementation. Therefore I propose the following: each interface, along with the computation methods, has two more methods: <em>prepare</em> and <em>clean</em>. They are called just before the call to any computation methods and just after, and at each time there is only one (or zero) interfaces that is <em>prepared</em>. For most interfaces both methods can be void, while for LAPACK interfaces, or any other implementation with a library that has dependencies that we want to manage, it could be something like this:</p>
<p><pre class="brush: cpp;">
class LAPACK_interface
{
public:
	LAPACK_interface (const std::string&amp; blas, const std::string&amp; lapack)
		: blas_(blas), lapack_(lapack)
	{ }

	void prepare() {
		handleBLAS = dlopen(blas_.c_str(), RTLD_NOW | RTLD_GLOBAL);
		handleLAPACK = dlopen(lapack_.c_str(), RTLD_NOW);

		dgesv_func = reinterpret_cast(dlsym(handleLAPACK, &quot;dgesv_&quot;));
	}

	void clean() {
		dlclose(handleLAPACK);
		dlclose(handleBLAS);
	}

  // C++-friendly dgesv interface : B   // Matrix storage is column-major
  // Returns the pivots and changes B
  std::vector solve_general(
    const std::vector&amp; A,
    std::vector&amp; B
  ) {
  	const int N = std::sqrt(A.size());
  	const int NRHS = B.size() / N;
  	std::vector ipiv(N);
  	int info;
  	dgesv_func(&amp;N, &amp;NRHS, &amp;A[0], &amp;N, &amp;ipiv[0], &amp;B[0], &amp;N, &amp;info);
        return ipiv;
  }

private:
	typedef void (*dgesv_t)(const int*, const int*, const double*, const int*, int*, double*, const int*, int*);

	const string blas_, lapack_;
	void *handleBLAS, *handleLAPACK;
	dgesv_t dgesv_func;
};
</pre></p>
<h3>The new algorithm</h3>
<p>All interfaces will be tested at the same time, which requires a new algorithm for the benchmarks. Briefly, testing all the interfaces doing an action of a given size is splitted into two stages.</p>
<p>The first stage determines how many times the calculation has to be run in order to make an unbiased and accurate benchmark. The BTL will pick a seed, generate the corresponding input matrices and loop over the interfaces: for each interface, the input is set, the the timer measure the time required for running the computation. This is done for many seeds and for each interface the times are summed, until the slowest interface reaches a given total time. The number of tested seeds is called &#8220;maxseed&#8221; and these will be the same seeds for the second stage. The slowest interface will run them once, while the other will run them as many times as the ratio between the time required by the slowest interface and them.</p>
<p>In the second stage the benchmarks are run the computed number of times, separately for each interface, with the same seeds as in the first stage.</p>
<p>As understanding my explanation is quite difficult, I make an example. Imagine we have three interfaces I1, I2 and I3 and want to test them for the action matrix-matrix-product with 300-by-300 matrices. We start with the seed 0 and generate the set of input matrices with this seed. Then we let interface work with these matrices and measure the time required &#8212; e.g. I1 requires 0.3 seconds, I2 lasts 0.9 seconds and I3 0.4 seconds. The we start with a new seed (i.e. 1) and do the same; we sum the results for each interface and have now the values 0.6, 1.7, 0.8. Say our stop time is 2 seconds, then we have to run the computation (at least) another time with a different seed. Now we pick the seed 2 and after the computation we have the following values: I1 -&gt; 0.9 sec, I2 -&gt; 2.7 sec and I3 -&gt; 1.3 sec &#8212; and we stop because the slowest interface (I2) reached the 2 seconds limit. This means that the first interface is approximately three times as fast as the second one and I3 is twice as fast as I2.</p>
<p>During the second stage we pick the first interface, iterate over the seeds 0..2 and for each seed we let the interface compute three times the result and finally average the result. We do the same with the interface I2, but this time we let it compute the result just once for each seed. I3 will compute the result twice for each seed. Eventually, the average for each interface is stored as result. Optionally, we could run the second stage more than once. The current BTL acutally has a similar algorithm and runs this second stage 3 times &#8212; this value is customizable.</p>
<h3>How to generate and store matrices</h3>
<p>The BTL decided to store matrices as STL vectors of vectors. This method makes it easy to read and write the matrix using the indices and allows the direct retrieval of the number of rows and columns. As drawback, it is slightly difficult to sequentially read or write the matrix, with either a row-major or column-major strategy. I would prefer this second need and store therefore the matrices just as long vectors, using a lexicografic ordering, and more precisely the column-storage strategy, which seems to be the most used one. As the actions are responsible for the generation of matrices, they also know these pieces of information and will not have problems when computing things &#8212; or delegating the computation to the interfaces &#8212; with the matrices. The same strategy can apply to N-dimensional arrays, which are useful in some tasks, like FFT.</p>
<p>The generation of a matrix is governed by the shape of the matrix and a seed, which is an integer. The seed is the initial value for a deterministic random number generator. A very simple yet powerful one is the linear congruential random number generator. The matrix generator would take a shape (specialization/overloads will be present for vector and matrices) and a seed and return an std::vector with random entries representing a vector, matrix or more-dimensional array. Notice that the new move syntax in the C++0x standard will avoid useless and expensive copies.</p>
<p>The last paragraph presented the most simple case, i.e. dense, random matrices. In case of symmetric matrices, only the upper part is generated, and the lower part is just copied from there. The same strategy will be used for SPD matrices, but to the result will be added the value N·maxv to the diagonal, where maxv is the absolute value of the maximal possible entry in the matrix (i.e. the maximal value returned by the random number generator).</p>
<h3>How to represent and store the results</h3>
<p>In my opinion, the benchmarking library should just measure as good as possible the performance of a library, i.e. the time is spent for a computation operation. It should not see to the representation of this data. Therefore, the output of a benchmarking library should  just be the time in seconds. Then, while representing the data could decide to plot the time against the problem size or try to retrieve the number of floating point required by one operation and plot the MFlops against problem size, this does not matter for the BTL. Therefore, I would save into the files just the sizes and the seconds.</p>
<p>Now I make a short excursus on how to plot the data. I see three possibilities:</p>
<ul>
<li><strong>Time vs. Size</strong>. Has many drawbacks. The only sensible possibility seems a loglog plot, because a semilogx plot would make it impossible to distinguish the time required by small sizes and a linear x axis would collapse this information into the left part. The loglog could be useful to understand the time-complexity of the algorithm, but is not very useful to compare different libraries.</li>
<li><strong>MFlops vs. Size</strong>. This is the most used plot and has many advantages. It solves the issues of time vs. size, but has the drawback that I already explained: it can be wrong.</li>
<li><strong>Time vs. Size with respect to a reference result</strong>. In this plot, a reference implementation is picked and its result is displayed as straight line with value 1; the time required by the other imlementations is displayed as fraction of the time required by the reference implementation. Provided that the differences between the reference results and the other are not too big, this plot combine the benefits of boths methods. A semilogx would be a reasonable strategy for the axes.</li>
</ul>
<p>Now a few remarks on how to store the data. BTL now writes one file for each tested library and each action, and this files contains two columns: one for the sizes, one for the results (in MFlops). This makes the information redundant &#8212; if we want to test the same sizes on all libraries &#8211;, since the sizes is written many times. I would instead place all libraries for a given action in the same file, which makes it easyer to read and compare them, takes less space and is a more clean way to save files. Moreover, dat files have some drawbacks, which made me create a C++ library for handling MAT files <a href="http://n.ethz.ch/~arteagaa/matfile/">as bachelor thesis</a>. MAT file are the <a href="http://www.mathworks.ch/help/pdf_doc/matlab/matfile_format.pdf">standard used by Matlab</a> for storing data, can store more than one matrix, support named arrays, stores the data in binary form, occupy less space than DAT files and are an open standard, which does not require external tools but the Zlib. I propose the following structure for a BTL file:</p>
<ul>
<li>The header can contain some information: here we would add the name of the action.</li>
<li>The first array would contain the sizes (integer array). Its name would be <em>sizes</em>.</li>
<li>The second array would contain the number of seeds used for each size. Its name would be <em>seeds</em>.</li>
<li>Then, all libraries would have an array with the time in seconds, whos name would be the name of the library.</li>
</ul>
<p>This would allow the BTL adding libraries to the file at a later time in a very consistent way, doing exactly the same tests as for the other libraries. It could be also possible to add an array mapping array names to library descriptions in order to generate more user-friendly graphs. Notice that Matla, Octave, Numpy and other widely used numerical software can handle MAT files.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/567/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/567/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/567/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=567&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/09/01/for-a-new-version-of-the-btl/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite – Final Report</title>
		<link>http://andyspiros.wordpress.com/2011/08/22/benchmarking-suite-final-report/</link>
		<comments>http://andyspiros.wordpress.com/2011/08/22/benchmarking-suite-final-report/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 18:23:39 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=557</guid>
		<description><![CDATA[This is the final report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the Google Summer of Code 2011. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=557&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the final report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the Google Summer of Code 2011.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the same implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-557"></span></p>
<h3>Release</h3>
<p>This report refers to the 0.1 release, which is the <a href="http://git.overlays.gentoo.org/gitweb/?p=proj/auto-numerical-bench.git;a=tree;h=40c901a4c2be8bf89e20a45e29130f8488176923;hb=587860cfcdc6845385bc89a7e49bca90caaff4a1">commit tagged with &#8220;0.1&#8243;</a> on the git repository.</p>
<h3>Archieved objectives</h3>
<p>In the original <a href="http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/spiros/1">project description</a>, I pointed out that I would have delivered a benchmarking suite and a script, acting respectively as &#8220;data provider&#8221; for the benchmarking results and as system organizer for the work to do, user interface, base for the customization, &#8230; During the summer I decided to adopt the BTL (Bench Template Library) as benchmarking suite, and actually much work has been devoted to this; the BTL was initially designed to be extensible, and I actually used almost all extensible features of the BTL in order to adapt it for my purposes. In facts, I discovered that the BTL is not quite perfect and decided that, after the GSoC, I will spend some time trying to figure out how this very powerful library could improve even more. Anyway, it has performed well for my GSoC project, and the modified version is available on <a href="https://bitbucket.org/spiros/btl">my hg repository</a>.</p>
<p>The script is in facts a Python library that:</p>
<ul>
<li>Uses the featured provided by portage to search, compile, emerge packages in a separate root; resolves the dependencies (this part is still unstable); follows the user&#8217;s instruction on the environment to use during the emerge process; stores the generated binary packages for future usage (instruction to the user are printed at the end).</li>
<li>Writes everything onto comprehensive and organized logs.</li>
<li>Interprets an highly customizable configuration file provided by the user to define the tests that have to be run.</li>
<li>Compiles the benchmarking suite with the correct options, flags, libraries,&#8230;</li>
<li>Wraps the suite execution and provides user-friendly output; stores its results in an organized fashion.</li>
<li>Collects the results and plots them.</li>
<li>Saves the plots and generates a comprehensive HTML report; the report contains, along with the images, information about the system and the time, the logs, the configuration file and optionally a summary figure.</li>
<li>Cleans the system.</li>
</ul>
<p>The script is modular: for each different library a module has to be provided. The module specifies where the particular benchmarking suite main source is, how to compile it, how to run it, how to interpret the results,&#8230; As most of the modules are tested through the modified BTL library, a generic code is present in an abstract BTL module and the specific modules can just inherit this module and add a few information. The following modules make use of the BTL:</p>
<ul>
<li>blas</li>
<li>cblas</li>
<li>lapack</li>
<li>scalapack</li>
<li>fftw</li>
</ul>
<p>The following modules make instead use of a different benchmarking suite, that is now part of the modified BTL, but follows a different implementation paradigm:</p>
<ul>
<li>blas_accuracy</li>
<li>lapack_accuracy</li>
</ul>
<p>These test the accuracy of the implementations instead of the computational speed.</p>
<p>The following module does not use a benchmarking suite, but relies on the information provided by the executable that is contained in the package:</p>
<ul>
<li>metis</li>
</ul>
<p>A total of 8 modules are provided. This exceeds the initial expectations; to be honest, I have to say that some of them (in particular blas_accuracy and metis) are very basic modules and lack of some features, while the module scalapack was not tested much and is to be considered unstable.</p>
<h3>Documentation</h3>
<p>A web page is present at <a href="http://soc.dev.gentoo.org/~spiros/">http://soc.dev.gentoo.org/~spiros/</a>. Here some documentation about how to install the numbenc package is present. Since numbench is much Gentoo-specific, only instruction on how to install it on Gentoo are provided. In facts, it is completely useless without the emerge and equery commands. The page also gives an overview on hot to run the script.</p>
<p>The package installs a man page numbench(1) that explains in more detail how to configure and run a test, where the logs and the results are and gives some more information. A set of sampl configuration files come with the package, too, and are installed into /usr/share/numbench/samples.</p>
<p>The source (which is available on the<a href="http://git.overlays.gentoo.org/gitweb/?p=proj/auto-numerical-bench.git;a=summary"> auto-numerical-bench git repository</a> for the script and on the <a href="https://bitbucket.org/spiros/btl">cited mercurial repository</a> for the benchmarking suite) contains some comments that could be useful for developers that want to write new modules or improve the project. In any case, if you plan adding features, please contact me!</p>
<h3>License</h3>
<p>The modified BTL has not changed license and is therefore re-released with the GPL -2 license. The same license is adopted for the Python part.</p>
<h3>Results</h3>
<p><a href="http://soc.dev.gentoo.org/~spiros/Results/">On the web page</a> a set of results are available. They cover almost all available module and also give examples of configuration files.</p>
<h3>Acknowledgements</h3>
<p>Fisrt of all, thanks to Google for sponsorizing the such a programme. This is of course useful for the students that have the possibility to do a real-work job (as student I often feel the need of a similar experience in my academic activities), earn something and, above all, at least for us Gentoo students, to get in touch with the FLOSS community.</p>
<p>Thanks to Donnie and the metors that make Gentoo being a perfect choice for a student who wants to participate to the SoC.</p>
<p>Many thanks to the whole Gentoo community that is always responsive and helpful. The forums, IRC channels and mailing-lists are full of experts that are ready to help you just as you need. Being a SoC student make you feel like a privileged and very respected person, which is just wonderful.</p>
<p>And a very big thank to my mentor Sébastien, who helped me a lot almost daily during the whole summer. Working with Sébastien is very motivating and his testing work has been very helpful. Thanks very much!</p>
<p>That&#8217;s all!</p>
<p>Best regards<br />
Andrea Arteaga</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/557/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/557/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/557/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=557&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/08/22/benchmarking-suite-final-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite – Report 12</title>
		<link>http://andyspiros.wordpress.com/2011/08/16/benchmarking-suite-report-12/</link>
		<comments>http://andyspiros.wordpress.com/2011/08/16/benchmarking-suite-report-12/#comments</comments>
		<pubDate>Mon, 15 Aug 2011 22:14:17 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=553</guid>
		<description><![CDATA[This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 8 &#8211; 15 August. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=553&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 8 &#8211; 15 August.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-553"></span></p>
<h3>Progrees during the week</h3>
<p>This week has been devoted to the following activities:</p>
<ul>
<li>documentation</li>
<li>bugfix</li>
<li>logging</li>
</ul>
<h4>Documentation</h4>
<p>A man page has been written. This covers the script features, the configuration file structure, the working directories and the module structure. The same information is available by running the script with the -h or &#8211;help switches. More information about each module is available through the commands &#8220;numbench [module] -h&#8221; (or * &#8211;help). Each module has its own printHelp() function for helping the user.</p>
<p>A very basic article on how to install and run the script is <a href="http://andyspiros.wordpress.com/2011/08/08/testing-numbench/">available on my blog</a>.</p>
<h4>Bugfix</h4>
<p>Some bugs have been solved in the BTL. Another bug that makes the MFlops calculation sometimes become negative has been investigated, but not yet solved.</p>
<p>More bugs have benn found in the script modules and have been solved.</p>
<h4>Logging</h4>
<p>The logging feature is a really important one for many reasons, and a new important log file has been added. This one stores everything (relevant) that the script prints on the terminal and is very useful for those that, for instance, remotely start the execution of the benchmarks and do not have to redirect the output somewhere anymore in order not to lose it. The script just writes everything on the main.log file in the log directory.</p>
<h3>Plan for the next weeks</h3>
<p>Today was the &#8220;soft pencils down&#8221; date and, as already stated, every relevant work has benn finished. During the next (last) week only bugfix will be done, and some documentation added.</p>
<p>Best regards<br />
Andrea Arteaga</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/553/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/553/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/553/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=553&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/08/16/benchmarking-suite-report-12/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Testing numbench</title>
		<link>http://andyspiros.wordpress.com/2011/08/08/testing-numbench/</link>
		<comments>http://andyspiros.wordpress.com/2011/08/08/testing-numbench/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 14:24:00 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[Varie]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">https://andyspiros.wordpress.com/2011/08/08/testing-numbench/</guid>
		<description><![CDATA[This article explains how to install the numbench script and to run the benchmarks. Install app-benchmarks/numbench First of all, the science overlay has to be added through layman; therefore, install layman on your Gentoo system if you don&#8217;t have it already, and do: After that you will be able to install the package app-benchmarks/numbench. Remember [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=544&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This article explains how to install the numbench script and to run the benchmarks.</p>
<h3>Install app-benchmarks/numbench</h3>
<p>First of all, the science overlay has to be added through layman; therefore, install layman on your Gentoo system if you don&#8217;t have it already, and do:</p>
<p><pre class="brush: plain;">
layman -L
layman -a science
</pre></p>
<p>After that you will be able to install the package app-benchmarks/numbench. Remember that numbench is still unstable and therefore I will need to install the ~x86/~amd64 version.</p>
<p>The package installs the executable &#8220;numbench&#8221;, some Python data and a man page numbench(1).</p>
<p>I recommend to add the bicatali overlay through layman too, because it contains many numerical libraries that can be benchmarked, even if we are migrating them into the science overlay.</p>
<h3>Run the benchmarks</h3>
<p>In order to run the benchmarks you have to provide a configuration file. The man page explains how to write one, and you will find some examples under /usr/share/numbench/samples. Once you have your configuration file (say conf.in), and you have chosen the module to test (e.g. blas, lapack or lapack_accuracy; see man numbench or numbench -h), just run the command</p>
<p><pre class="brush: plain;">
numench module conf.in -s
</pre></p>
<p>The documentation explains how to run the test with more parameters in order to choose the tests that have to be performed.</p>
<p>After the execution you will find interesting directories under ~/.benchmarks:</p>
<ul>
<li>log contains the log, obviously; they are divided in subfolders in case of multiple runs</li>
<li>packages contains the packages that are useful if you decide to install some tested one (the documentation here lacks)</li>
<li>reports contains for each run a set of images, an HTML page and a copy of the logs; they are ready to be published somewhere, just copy the whole folder in your www directory</li>
<li>roots and tests are two directories which are used by the script for storing data; they are keeped in order not tu run the tests again if the results already exist</li>
</ul>
<div>Please let me know if you find any bugs!</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/544/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=544&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/08/08/testing-numbench/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite – Report 11</title>
		<link>http://andyspiros.wordpress.com/2011/08/08/benchmarking-suite-%e2%80%93-report-11/</link>
		<comments>http://andyspiros.wordpress.com/2011/08/08/benchmarking-suite-%e2%80%93-report-11/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 12:30:04 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=537</guid>
		<description><![CDATA[This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 1 &#8211; 7 August. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=537&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 1 &#8211; 7 August.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-537"></span></p>
<h3>Progrees during the week</h3>
<p>This week has been devoted to the following features:</p>
<ul>
<li>Stabilization of the PBLAS/ScaLAPACK tests</li>
<li>Introduction of the METIS tests</li>
<li>Implementation of the LAPACK accuracy tests</li>
<li>Refinement of the LAPACK performance tests</li>
</ul>
<h4>METIS</h4>
<p>METIS is a package &#8212; a set of executables and a library &#8212; that perform some common preliminary numerical work: graph partitioning, mesh partitioning, sparse matrix reordering. The metis.py module taht has been introduced benchmarks the run times of the executables <em>pmetis</em> and <em>kmetis</em> when doing the most important task: graph partitioning &#8212; the other task rely on this one. In order to keep the benchmarks fair, the same input data is processed by every &#8220;implementation&#8221; (i.e. for each different compilation of the sci-libs/metis package). The module is already stable and usable, although some more tests could be added.</p>
<h4>LAPACK performance tests</h4>
<p>We decided to benchmark matrix decompositions rather than full solvers, as the decompositions are the real core of each LAPACK solver, while the substitutions and transposed matrix-vector multiplication are BLAS tasks that have to be benchmarked separately. Therefore the following tests have been deprecated &#8212; they are still available, but are not part of the standard set:</p>
<ul>
<li>general_solve: General linear system of equations solver</li>
<li>least_squares: General linear least squares solver</li>
<li>symm_ev: Symmetric matrix eigensolver &#8212; eigenvalues only</li>
</ul>
<div>Some new tests have been added, and the following are the resulting standard tests:</div>
<div>
<ul>
<li>lu_decomp: LU decomposition</li>
<li>cholesky: Cholesky decomposition of a SPD matrix</li>
<li>qr_decomp (new): QR decomposition</li>
<li>svd_decomp (new): Singular Values Decomposition</li>
<li>syev (new): Symmetric matrix eigensolver (eigenvalues and eigenvectors), full diagonalization</li>
<li>stev (new): Tridiagonal matrix eigensolver (eigenvalues and eigenvectors), full diagonalization</li>
</ul>
</div>
<h4>LAPACK accuracy tests</h4>
<p>Following the blas_accuracy.py module strategy, a new much more interesting module has been added for testing the different LAPACK implementations for accuracy. This includes every LAPACK standard tests (see above section). The matrix decomposition are tested by multiplying the results and comparing this to the original matrix; the eigensolvers are treated as decompositions (diagonalization) and tested the same way. Some reports are available <a href="http://www.phys.ethz.ch/~arteagaa/soc/lapack_accuracy/">on my homepage</a>.</p>
<h3>Plan for the next weeks</h3>
<p>The GSoC programme is reaching the end. Next week is the soft &#8220;pencils down&#8221; date. Therefore I will spend the next week doing only tests and bugfixes, writing documentation and performing other &#8220;administrative&#8221; tasks (ebuild refinment, repository management,&#8230;). On Friday the unstable branch of the repository will be merged to the stable. Then a full report will be written and sent to the mailing list.</p>
<p>After the soft &#8220;pencils down&#8221; date, only critical bugs will be fixed and more documentation will be added. Then, I will release a stable, tested, documented and supported version of my suite just before the hard &#8220;pencils down&#8221; date.</p>
<p>I would appreciaty <strong>very much</strong> some helping hand in testing my script! If somebody is interested in having a look of it, it very easy to install the required packages and run some tests. So please if you have some time, let it run and report bugs that you find (if any) to me; if you don&#8217;t find bugs, please write to me as well! I will write a short howto for run the tests and send the link to the mailing list.</p>
<p>Best regards<br />
Andrea Arteaga</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/537/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=537&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/08/08/benchmarking-suite-%e2%80%93-report-11/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite – Report 10</title>
		<link>http://andyspiros.wordpress.com/2011/08/03/benchmarking-suite-report-10/</link>
		<comments>http://andyspiros.wordpress.com/2011/08/03/benchmarking-suite-report-10/#comments</comments>
		<pubDate>Tue, 02 Aug 2011 23:55:41 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=532</guid>
		<description><![CDATA[This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 25 July-1 August. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=532&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 25 July-1 August.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-532"></span></p>
<h3>Progrees during the week</h3>
<p>This week has been devoted to the following features:</p>
<ul>
<li>Management of the dependencies</li>
<li>Improvement of the PBLAS/ScaLAPACK tests and investigation of some BTL-related problem &#8212; see below</li>
<li>3-Dimensional FFTW tests</li>
</ul>
<p>Before this week&#8217;s work, only packages without unsatisfied dependencies on the main system could be tested. For example, if one did not have cmake installed in the system, he could not test eigen, as it requires cmake to be installed in order to compile. This is not a problem anymore. The dependencies are emerged into the &#8220;sandbox&#8221; root along with the package to be tested, and the environment is adjusted in order to make use of the software installed. The delay of this report is due to this work, which has required more work as expected &#8212; a small rewriting of the preliminary part of the script has been done &#8211;; now the report can be complete.</p>
<p>Some investigation has been done regarding the input matrices and vectors for the tests. In order to be fair, the tests should produce the same inputfor the libraries, as different input could require different work by the libraries. The linear congruential random number generatorhas been written, along with the new-style matrix generator. This will not completely solve the problem, but it represents a step in the right direction. Now the PBLAS/ScaLAPACK tests use this method, and the results are satisfactory.</p>
<p>The work with FFTW is finished for now. 1-dimensional, 2-dimensional and 3-dimensional complex discrete Fourier transformations are tested in both forward and backward versions and with the &#8220;estimate&#8221; and &#8220;measure&#8221; strategies. If the time will allow it, I will also add sinus and cosinus tranformations.</p>
<h3>Plan for the next week</h3>
<p>During the following week I will implement the tests for METIS. This is a set of executables and libraries that compute graph and mesh partitioning &#8212; along with other numerical functions &#8211;, which are widely used in the field of numerical simulations and other scientific computations.</p>
<p>The project has intensively gained features in the last weeks. After the METIS tests implementation, the quality of the whole project will be tested and improved. A considerable part of the time has been devoted to the BTL library, and more work will be performed in order to submit the new code upstream. Some work has to be also done in order to make the script more user-friendly; this means managing the exceptions that occur sometimes, avoid conflicts between tests already done and the currect ones, reduce the probability of errors. Much documentation has to be written. The ebuild will be splitted in two part, a Gentoo-specific one and a general one containing the modified BTL.</p>
<p>All this work will eventually provide a finished and usable project, which is my primary objective for the programme. I do not want to provide an incomplete or undocumented software; I rather prefer pospone the inclusion of new features after the programme end, if the time is not enough.</p>
<p>Best regards<br />
Andrea Arteaga</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/532/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/532/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/532/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=532&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/08/03/benchmarking-suite-report-10/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite – Report 9</title>
		<link>http://andyspiros.wordpress.com/2011/07/25/benchmarking-suite-report-9/</link>
		<comments>http://andyspiros.wordpress.com/2011/07/25/benchmarking-suite-report-9/#comments</comments>
		<pubDate>Sun, 24 Jul 2011 22:38:54 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=502</guid>
		<description><![CDATA[This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 18-24 July. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=502&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the report of the project &#8220;Automated benchmark suite for numerical libraries in Gentoo&#8221; for the week 18-24 July.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-502"></span></p>
<h3>Progrees during the week</h3>
<p>This week the project has gained the following features:</p>
<ul>
<li>Work on the FFTW module. Now the following two-dimensional actions are available:</li>
<ul>
<li>FFTW_2D_Forward_Measure</li>
<li>FFTW_2D_Forward_Estimate</li>
<li>FFTW_2D_Backward_Measure</li>
<li>FFTW_2D_Backward_Estimate</li>
</ul>
<li>Work on the PBLAS/ScaLAPACK module:</li>
<ul>
<li>Parallel axpy</li>
<li>Parallel matrix-vector multiply</li>
<li>Parallel LU decomposition</li>
<li>Parallel Cholesky decomposition</li>
<li>Parallel QR decomposition</li>
<li>Parallel SVD decomposition</li>
<li>Parallel eigenvalues/eigenvectors computation</li>
</ul>
</ul>
<p>Regarding the ScaLAPACK actions, some more work is needed in order to avoid singular or non-SPD matrices to be processed. This will be part of the next week&#8217;s work. The 2-dimensional FFTW actions work well, and the module will also gain soon the 3-dimensional actions.</p>
<h3>Plan for the next week</h3>
<p>The issue with ScaLAPACK raised another problem, too: every implementation that is tested should receive the same input matrices/vectors, in order to make the tests fair. Therefore a decision has been taken: the matrices will be generated by a deterministic random number generator &#8212; probably a linear congruential one &#8212; and the tests will share the seeds. The seeds will be taken from a set of seeds known to generate valid matrices (e.g. SPD matrices for algorithms that require so). This will be an important part of the work.</p>
<p>Another part regards the package dependencies. At this moment the script does not handle package dependencies: it only installs the desired packages into some specific root, but fails when the package has dependencies which are not installed in the system. The script will be adapted in order to also install the dependencies into the same root and manage the environment variables.</p>
<p>A last part of the week plan regards the input configuration file. When testing a dependent library (e.g. LAPACK, which depends on BLAS and/or CBLAS), the user should be able to select as implementation for the dependency (e.g. BLAS or CBLAS in this case) not only an installed one, but also an implementataion that has been tested.</p>
<p>Best regards<br />
Andrea Arteaga</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/502/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=502&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/07/25/benchmarking-suite-report-9/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Benchmarking suite &#8211; Report 8</title>
		<link>http://andyspiros.wordpress.com/2011/07/18/benchmarking-suite-report-8/</link>
		<comments>http://andyspiros.wordpress.com/2011/07/18/benchmarking-suite-report-8/#comments</comments>
		<pubDate>Mon, 18 Jul 2011 01:22:36 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=496</guid>
		<description><![CDATA[A very short log of what has been done during the last days. I am working on the PBLAS and ScaLAPACK benchmarks, which is a very challenging topic, because it is very difficult to debug such applications. * I changed some parts of the BTL framework, adapting it to the distributed memory benchmarks. This has [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=496&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A very short log of what has been done during the last days. I am<br />
working on the PBLAS and ScaLAPACK benchmarks, which is a very<br />
challenging topic, because it is very difficult to debug such<br />
applications.</p>
<p>* I changed some parts of the BTL framework, adapting it to the<br />
distributed memory benchmarks. This has required writing two new<br />
perfanalyzers &#8212; one for the root process, one for the other (node)<br />
processes. The nodes do not perform any measurement, while the root<br />
process broadcasts the needed informations, measures the time and<br />
manages the output (both std{out,err} and resulting file).</p>
<p>* I added a BLACS library that provides an useful interface which<br />
scatters and gathers matrices and vectors. I also added a PBLAS<br />
library that inherits the BLACS one and will support the most common<br />
operations (at the moment just the parallel matrix-vector<br />
multiplication).</p>
<p>* I added an action for the parallel matrix-vector multiplication<br />
which makes use of the two described interfaces.</p>
<p>The matrix-vector multiplication is a case study for now. If<br />
everything goes fine (and it seems so, now), then more actions will be<br />
provided, for both PBLAS and ScaLAPACL, which share the same concepts.<br />
I plan to have tomorrow a working (but incomplete) Python module for<br />
PBLAS, too.</p>
<p>Milestones for the next week:<br />
* Having working PBLAS and ScaLAPACK modules<br />
* Do some benchmarks using these modules and publish the results<br />
* Start the implementation of the advanced FFTW benchmarks, as<br />
previously described</p>
<p>Best regards<br />
<span style="color:#888888;">Andrea Arteaga</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/496/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/496/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=496&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/07/18/benchmarking-suite-report-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
		<item>
		<title>Mid-term report</title>
		<link>http://andyspiros.wordpress.com/2011/07/13/mid-term-report/</link>
		<comments>http://andyspiros.wordpress.com/2011/07/13/mid-term-report/#comments</comments>
		<pubDate>Wed, 13 Jul 2011 10:00:34 +0000</pubDate>
		<dc:creator>andyspiros</dc:creator>
				<category><![CDATA[Google Summer of Code]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[soc]]></category>

		<guid isPermaLink="false">http://andyspiros.wordpress.com/?p=481</guid>
		<description><![CDATA[This report presents the status of the Google Summer of Code project Automated benchmark suite for numerical libraries in Gentoo before the mid-term evaluation. Project description The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=481&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This report presents the status of the Google Summer of Code project <em>Automated benchmark suite for numerical libraries in Gentoo</em> before the mid-term evaluation.</p>
<h3>Project description</h3>
<p>The project aims to develop a simple yet powerful automated system of benchmarking for numerical libraries. The Gentoo software system provides many implementations of widely used standards such as BLAS, CBLAS, LAPACK, ScaLAPACK and some other numerical library such as FFTW, MKL. The developed tools will aid the system maintener to choose the best suited implementation with respect to the machine hardware and to test the samer implementation or different ones with different compilers, compiler versions and compile flags.<span id="more-481"></span></p>
<h3>Status of the project</h3>
<p>A set of Python scripts and a set of benchmarking suites written in C++ are provided. These tools are able to perform the following tasks:</p>
<ul>
<li>Benchmark different implementations of the standard libraries BLAS, CBLAS, LAPACK</li>
<li>Benchmark the FFTW library</li>
<li>For each test choose different packages or package versions and for each package or version provide a customized compile environment</li>
<li>Customize the typology of tests to be performed</li>
<li>Generate an HTML report with plots regarding the performed tests in PNG format</li>
</ul>
<p>The following features are provided:</p>
<ul>
<li>Error management: even if a benchmark crashes, the script continues with the next task, providing an error message; the failed test will be ignored when generating the reports</li>
<li>Complete logging system: everything relevant is logged in a directory, from the emerging of a package to the benchmarking suite execution</li>
<li>Human readable output to the terminal as interface to the more low-level underlying test suites</li>
<li>Ability to skip specific implementations</li>
<li>Ability to specify a specific implementation for the libraries used by the tested libraries (for example many LAPACK implementations lay on BLAS and CBLAS libraries)</li>
<li>Compiled packages, tests results, logs and reports are saved. The tests are not run if all needed result exist</li>
<li>The high-quality benchmarking system BTL is used</li>
</ul>
<h3>Using the software</h3>
<h4>Installing the package</h4>
<p>Fisr of all, the repository <em>bicatali</em> has to be installed. This can be done through layman. This repository provides updated packages for every numerical library and a patched eselect version. If you have some numerical library installed, you will have to remove them and install the versions provided by the <em>bicatali</em> overlay.</p>
<p>The repository git://git.overlays.gentoo.org/proj/auto-numerical-bench.git contains a portage tree with the ebuild for installing the software. One can set up the repository using layman (by adapting the configuration file) or checking the repository out and adding it to the PORTDIR_OVERLAY environment variable. The package has the name autobench. Until the first release, only the version 9999 is provided with the keywords ~x86 and ~amd64 (it were only tested on amd64 though). The autobench package will install the files under /usr/(libdir)/autobench and a symlink into /usr/bin.</p>
<p>In order to run the script, type <em>autobench</em> and add the arguments that are described below</p>
<h4>Configuring the tests</h4>
<p>In order to run the benchmarks, a configuration file has to be provided. In this file every line defines a package containing implementations to be tested. The line begins with a single-word label that serves as identifier. The second token is the package; every expression that is accepted by emerge works. After that, environment variables and other flags can be appended. A line starting with an # is skipped (comments).</p>
<p>For example, if one wants to test BLAS implementations, he could want to emerge the packages atlas, eigen, openblas, acml. He could also want to test different version of atlas with different gcc versions (or icc). An example of configuration file could be the following:</p>
<p><pre class="brush: plain;">
#ATLAS
atlas-3.8 sci-libs/atlas-3.8.4 CFLAGS=&quot;-O3 -march=native&quot;
atlas-3.9 sci-libs/atlas-3.9.41 CFLAGS=&quot;-O3 -march=native&quot;
atlas-3.9_gcc-4.6.1 sci-libs/atlas-3.9.41 CFLAGS=&quot;-O3 -march=native&quot; CC=&quot;gcc-4.6&quot;

# ACML
acml sci-libs/acml-4.4.0-r1 -acml32-gfortran -acml32-gfortran-openmp
</pre></p>
<p>Notice that gcc-4.6.1 must be installed in the system. As every package can install more than one implementation, every actual test is referenced through the following string: <em>line-identifier</em>/<em>implementation</em>. For example, sci-libs/atlas install the implementations <em>atlas</em> and <em>atlas-threads</em>. We have therefore six  tests regarding atlas, which are identified by:</p>
<ul>
<li>atlas-3.8/atlas</li>
<li>atlas-3.8/atlas-threads</li>
<li>atlas-3.9/atlas</li>
<li>atlas-3.9/atlas-threads</li>
<li>atlas-3.9_gcc-4.6/atlas</li>
<li>atlas-3.9_gcc-4.6/atlas-threads</li>
</ul>
<p>The ACML installs different implementations depending on the USE flags. Let assume in our example that it installs four:</p>
<ul>
<li>acml32-gfortran</li>
<li>acml32-gfortran-openmp</li>
<li>acml64-gfortran</li>
<li>acml64-gfortran-openmp</li>
</ul>
<p>with both 32-bit and 64-bit profiles. As we only want to test the 64-bit versions, we can add the strings <em>-acml32-gfortran</em> and <em>-acml32-gfortran-openmp</em> in order to avoid this implementations to be tested.</p>
<div>
<p>Another possible argument is the ability to control the used libraries. For example, the LAPACK reference implementation (package lapack-reference) makes use of the BLAS routines. This can be any of the provided ones. One could therefore test the performance of this package when using the eigen BLAS implementation and when using the openblas-threads one:</p>
<p><pre class="brush: plain;">
reference_eigen sci-libs/lapack-reference-3.3.1-r1 blas:eigen
reference-openblas sci-libs/lapack-reference-3.3.1-r1 blas:openblas-threads
</pre></p>
<p>The arguments blas:implementation instruct the script to use the desired implementation when running the suite. If none is specified (e.g. here no cblas implementation is specified), then the standard one will be picked. Notice that the desired implementation (in this case eigen and openblas-threads) has to be installed in the system.</p>
<h4>Running the tests</h4>
<p>Once a configuation file has been created (call it here conffile.in), the tests can be run. The call synopsis is the following:</p>
<p><pre class="brush: plain;">
autobench library conffile.in arguments
</pre></p>
<p>where <em>module </em>is the library to be tested (e.g. blas, cblas, lapack, fftw); <em>conffile.in </em>is the described configuration file which describes the implementations for the desired library; arguments are the module-dipendent arguments and usually are the numerical tests to be performed. In the following we see a list of accepted arguments for every provided module. Every module accepts the two flags -s or &#8211;summary and -S or &#8211;summary-only:</p>
<ul>
<li>-s and &#8211;summary enable the summary figure, which is a single figure with many plots (subplot) that summarizes the tests. As the legend on such small plots often hides the lines, it is not displayed if standard plots are present (i.e. if the -S argument is not given)</li>
<li>-S and &#8211;summary-only enable the summary figure and disable the standard, single-plot figures. This also enables the legends on the summary figure.</li>
</ul>
<h5>BLAS</h5>
<p>The module blas.py accept the following tests as arguments:</p>
<ul>
<li>axpy</li>
<li>axpby</li>
<li>rot</li>
<li>matrix_vector</li>
<li>atv</li>
<li>symv</li>
<li>ger</li>
<li>syr2</li>
<li>trisolve_vector</li>
<li>matrix_matrix</li>
<li>aat</li>
<li>trisolve_matrix</li>
<li>trmm</li>
</ul>
<p>The same tests are accepted by the cblas.py module.</p>
<p>If no test is given as argument, then the following four standard tests are selected:</p>
<div>
<ul>
<li>axpy</li>
<li>matrix_vector</li>
<li>trisolve_vector</li>
<li>matrix_matrix</li>
</ul>
</div>
<h5>LAPACK</h5>
<p>The lapack module accepts the following tests as arguments:</p>
<ul>
<li>general_solve</li>
<li>least_squares</li>
<li>lu_decomp</li>
<li>cholesky</li>
<li>symm_ev</li>
</ul>
<p>If no arguments are given, then all tests are selected.</p>
<h5>FFTW</h5>
<p>The fftw.py module accepts the following tests as arguments:</p>
<ul>
<li>FFTW_1D_Forward_Measure</li>
<li>FFTW_1D_Forward_Estimate</li>
<li>FFTW_1D_Backward_Measure</li>
<li>FFTW_1D_Backward_Estimate</li>
</ul>
<p>If no arguments are given, then all tests are selected.</p>
<h5>BLAS_ACCURACY</h5>
<p>The blas_accuracy module accepts the following tests as arguments:</p>
<ul>
<li>axpy</li>
<li>matrix_vector</li>
<li>trisolve_vector</li>
<li>matrix_matrix</li>
</ul>
<p>If no arguments are given, then all tests are selected.</p>
<h3>Results</h3>
<p>On the page <a href="http://www.phys.ethz.ch/~arteagaa/soc/">http://www.phys.ethz.ch/~arteagaa/soc/</a> some report examples generated by the script are available.</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/andyspiros.wordpress.com/481/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/andyspiros.wordpress.com/481/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/andyspiros.wordpress.com/481/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andyspiros.wordpress.com&amp;blog=8566934&amp;post=481&amp;subd=andyspiros&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andyspiros.wordpress.com/2011/07/13/mid-term-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2094e3540201216badc3c0e707183d7a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andyspiros</media:title>
		</media:content>
	</item>
	</channel>
</rss>
