-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94
Comments
I found Integer 8 is not necessary, so I change back and use float complex even though my model is very large. I split it to different ranks so that integer4 is fine, but the total nonzeros are above the limit of integer4. Now it seems I get stuck in this step Initializing STRUMPACK For a small model, there is no vector:: M_default_append, I looked for some information online, it should be related to C++. |
This seems unable to overcome, because STRUMPACK uses 32-bit indexing for BLAS, LAPACK and ScaLAPACK, However, total non-zeros is over 3.5billion, 32-bit integer can hold a maximum digit around 2 billion. |
I think the code is running out of memory in the column permutation phase. This uses the MC64 code, which is sequential. So the code needs to gather the whole input matrix to the root MPI process, then call the MC64 code there, and then broadcast the result. The column permutation is done to maximize the diagonal entries, but if your problem already is diagonally dominant (or has non-zero diagonal entries) then this step might not be necessary. You can try to disable it with
|
If you want to use the 64bit interface, you need to specify:
or equivalent for the different floating point precisions. There is a way to use 64 bit integers for the BLAS/LAPACK routines, but that should not be necessary. |
Thanks for your reply. Initializing STRUMPACK |
Could this just be running out of memory? Do you have an estimate for the required memory usage? Perhaps from some smaller runs you can extrapolate the memory usage to get an estimate. |
Yes, I have estimated the memory, usually, the actual memory usage is above 1.6 times more than than the estimated value. Now I am testing it with enough memory. In fact, I my previous tests, I deliberately run it under insufficient memory, but it will stop at later steps, not that step right after matrix equilibration. Anyhow I will test this. Thanks very much. |
Hi, Pieter. I have tried by creating a banded matrix by myself and seems it is not the problem of insufficient memory. I found that it when total nonzeros exceeds the maximum of integer4 (2,147,483,648), below error will imediately show up before the step intiate matrix, while under the maximum of integer4, code is successfull. So, does this mean I have to use in 64-bit STRUMPACK? Initializing STRUMPACK |
Or is it the Metis should be 64-bit? because I use metis for reordering |
You can try with 64bit METIS. STRUMPACK can use either 32 or 64 bit METIS. |
For the step initializing matrix, does it reach matrix reordering? Now I should reconfiger with also 64 metis, besides what you mentioned, right
|
Yes, you can try METIS with 64 bit integers. STRUMPACK can use METIS with either 32 or 64 bit. |
Hi, Pieter. I find each integer input of ''STRUMPACK_set_distributed_csr_matrix(S, c_loc(locN), c_loc(IA3), c_loc(JA3), c_loc(A2),c_loc(RowS),1)'' must be integer4 type, otherwise segmentation errror will happen. Maybe I think it is not the problem of BLAS/LAPACK as you said, just the problem of the data type of interfaces of STRUMPACK, In STRUMPACK, there is step that calculates total non-zeros, this is must integer8 for my case. So, I wonder if there is any solution for my problem? So, I think you have to work out a 64-bit integer strumpack interface, like this STRUMPACK_set_distributed_csr_matrix, so that it can call BLAS/LAPACK or metis to work. |
Like I tested, the problem is in the initialization, in which 64-bit integer should be used. I guess initialization does not need BLAS/LAPACK, or METIS. |
For
the arguments It could be that the total number of nonzeros is larger than the 32 bit limit, but locally on each rank the number of nonzeros fits in 32 bit. In that case, I think it will print out a negative number for the number of nonzeros, but it might still run correctly. I can try to rewrite that part of the code, so that it uses 32 bit integers everywhere except for printing the total nnz, but I need some time to do that carefully. But you can run with 64 bit integers everywhere. I don't see why that wouldn't work. |
Thank you very much. Oh, I thought floatcomplex_64 is higher double complex precision. Now I know how to run it. I should use this floatcomplex_64. |
Using floatcomplex_64 the error changed, I think the error new is just something else. When under the 32-bit limit, it seems fine: **_ Initializing STRUMPACK
However, when above the 32-bit integer limit, the error is different than before, like this: **_Initializing STRUMPACK BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES The difference is the matrix equilibration, from type B to type N, do you have any idea why this difference happens? |
That should be the same. |
Actually, I didn't change anything but the number of nonzeros. Below the 32-bit limit, it works, above the limit, it does not work. I do see there are some 'bus error' on some ranks.
|
The equilibration is based on the |
Thank you. May I ask another question. Recently, my HPC manager has reconfigured the STRUAMPACK. However, my old code can not get correct solutions anymore, even the log seems correct. But before, I have checked that the solutions are correct, for this installation now, solutions are all 0 all the time. I have no idea what happened to it. It is very wired. I asked my manager to reinstall it again and also try to ask if you have a clue. Initializing STRUMPACK
|
And this is configuring file from my manager: !#STRUMPACK 7.1.2 !###Prepare modules !### Prepare directories
!### Download and extract !## Configure, Build and Install
!## Modulefile
|
I checked the |
Hi, sorry to bother you again.
I would like to ask the problem about integer type. I didn't see any integer type in SRC/fortran. So, the default integer type is integer(4), right?
Now I need to use long integer, namely integer(8) and float complex,. I changed the integer type in my code, but I meet segmentation error like below:
Do I have to both compile and link with flag -i8 ?
Also, could you please explain the difference between float complex and float complex_64?
The text was updated successfully, but these errors were encountered: