-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMR^3: templated eigensolver for Elemental #189
base: master
Are you sure you want to change the base?
Conversation
Elemental assumed C++11 in late 2013 according to my email archives. |
I'm not sure exactly how you've made changes here, but, it appears that you are "moving" files. I would strongly recommend making use of |
@jeffhammond thanks! |
There is insane amount of warnings in Clang build caused by C code translated from Fortran. Maybe it's worth fixing, because they will appear in compilation of each translation unit. |
I haven't had a chance to step through this yet, but this is a much needed addition. And I fully agree that it would be better to not ever use f2c. My recent experience with implementing tridiagonal and bidiagonal divide and conquer natively in Elemental shows that starting from the bottom up is not as much work as one would think. |
@@ -0,0 +1,115 @@ | |||
/* odcpy.f -- translated by f2c (version 20061008) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be easier to just have a for loop instead of calling odcpy
?
@@ -0,0 +1,107 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this routine. I vote for the obvious two-line trivial implementation.
|
||
namespace pmrrr { namespace lapack { | ||
|
||
template<typename FloatingType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This routine seems like low-hanging fruit for a ground-up implementation as well; also, how will the fabs
function for non-standard datatypes (e.g., El::BigFloat
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, to be honest I haven't paid much attention to non-standard datatypes.
Is it fine that I use existing MPI and math functions wrappers from Elemental? Or do you prefer for the PMR^3 to be independent on Elemental? I can implement simple serialization there if it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an ideal world, it would be part of the main library and use proper C++ functions instead of f2c, but the latter may take a substantial amount of work. You may want to look at how I handled similar issues in ElSuiteSparse in external/suite_sparse
.
@poulson Templated eigensolver is not new, it is still based on existing PMR^3 implementation which did use f2c extensively. I agree that the code is horrible to read and some changes may be necessary. For example, detection of character encoding in olsame could be replaced only with tools from standard library. |
@rhl- @mcopik For what it's worth,
|
typedef double Real; | ||
typedef Complex<Real> C; | ||
template<typename Real> | ||
void run_example(Int n, bool print) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't properly documented, but the examples/
folder is meant to be more for demonstrating functionality, and the tests/
folder is meant for correctness tests.
As an aside, Elemental uses CamelCase
for function names rather than snake_case
.
It might also be preferred to test both single-precision and double-precision in the same run (with the ideal case being able to individually disable each precision).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, I'm going to fix that.
By being able to disable each precision you mean a compilation flag? Or runtime argument?
option(HAVE_SPINLOCKS "Enable if pthread lib supports spinlocks" OFF) | ||
MARK_AS_ADVANCED(HAVE_SPINLOCKS) | ||
if(NOT HAVE_SPINLOCKS) | ||
add_definitions(-DNOSPINLOCKS) | ||
set(pmrrr_defines "${pmrrr_defines}#define NOSPINLOCKS\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those defines have been used as a flag to build PMR^3 as a library. Now most of the code (and 100% of code relying on availability of pthreads and spinlocks) have been moved to templated code included by Elemental. To make configuration flags available, I changed one of PMR^3 headers to a CMake configuration file, installed in both build and install directory.
That's why those flags are accumulated and used for configuration of the header, not passed directly to compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed reply; though it seems it would be more usual to use cmakedefine
within the configure file rather than explicit string includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I didn't know about that feature.
This pull request introduces a templated version of PMR^3. It is a result of student project at RWTH from last winter semester and it has been already discussed on mailing list in February.
Our results generated with several tridiagonal matrices of different types (Legendre, Wilkinson, matrices obtained from chemical problems) show a decrease of computation time between 20 and 50%, when desired accuracy allows for single precision computations. We didn't notice any performance change for double-precision computation after switching from pure C to C++.
I've been working with the original PMR^3 repository to create a templatized version and then I applied changes from Elemental's version of PMR^3. Significant changes between C and C++ versions enforced a rather manual process of patching the eigensolver.
PMR^3 requires two additional preprocessor flags (pthreads and spinlock) and to avoid polluting Elemental with additional flags, a config file for PMR^3 is created by CMake and placed in both build and install directory.
There are few issues to resolve; I'm also not quite sure about coding style and properly defining imports for templated PMR^3. All suggestions and comments are welcomed.
Issues:
CC: @pauldj