Feature eigen
Context
This MR generalizes the use of Eigen in waves.
Tasks
Following MR !50 (merged), here is the updated to-do list:
replacegmm::matrix<double>
byEigen::Matrix<double>
replacestd::vector<double>
byEigen::(Row)Vector<double>
refactor elementary matrices computationchange method return type, ievoid build(Matrix const &A, Matrix &K)
should becomeMatrix K build(Matrix const &A)
replacetbox::Pt
byEigen::VectorXd
investigate Pardiso interfaceinvestigate Eigen multithreading feature- clean the thermo-mechanical matrices
Notes
- I added a general interface
tbox::LinearSolver
from which derivetbox::Mumps
,tbox::Pardiso
,tbox::Gmres
andtbox::SparseLu
. The first class is a direct interface to the MUMPS solver, while the other three merely call Eigen interface or implementation of these solvers. The linear solvers are interfaced in python and the nonlinear solvers must now be initialized with a problem and a linear solver. - I did not clean the thermo-mechanical matrices computation in
tbox::Tetra4
andtbox::Hex8
. This could be further addressed in the branch clean_mirrors, which brings LFS support for large geometry files and improved tests. - Regarding the multi-threading in Eigen, it does not make sense to use it if the top level functions in waves are multi-threaded, which should be the case. However, multi-threading can be used for linear solvers. For now, only Pardiso can use multi-threading (GMRES and SparseLU are not multi-threaded and I did not look into MUMPS).
Additional changes
- I replaced the
std::vector
holding the shape functions byEigen::Matrices
, and consequently updated the Cache, ShapeFunction and Gauss classes. Additionally, the members of Cache and Gauss are now private and can only be accessed by getters (which I inlined). - I replaced the
std::vector
bystd::deque
to hold the list ofEigen::Triplet
for the global matrices assembly, as it proved to be faster. - I changed the assembly procedure of periodic BCs (
tbox::MshDeform
) and Kutta condition (flow::Newton
andflow::Picard
) so that the contributions are directly assembled on the right rows, thus avoiding the costly manipulation of the rows after the matrix has been built.
Tests
Passed on ubuntu18.04 (python 2.7.15), debian4.9 (python 2.7.13), msys2 (python 3.8.1).
Merge request reports
Activity
@Kim.Liegeois I need your help to adapt build_thermal and build_thermomecanical in Tetra4 and Hex8 classes. I would need at least the formula used to compute the matrices. I could reconstruct it be "de-rolling" the loops, but I ma lazy :). Also, your tests contain no value check, I therefore cannot be sure of what I am doing...
130 130 131 131 // List of triplets to build Jacobian matrix 132 132 std::vector<Eigen::Triplet<double>> T; 133 T.reserve(sol->pbl->msh->nodes.size()); 133 T.reserve(sol->pbl->msh->nodes.size() * sol->pbl->msh->nodes.size() / 1000); The classic way of creating sparse matrices in Eigen is as follows:
- create a std::vector, each Triplet containing the row index, column index and value of a matrix entry
- create a Triplet and push it for each matrix entry we want to add
- let Eigen loop over the vector and create the matrix
The problem with this approach is that each time we want to add a value at the position (rowi, colj), we create a new entry in the vector, even if the value at (rowi, colj) is already existing. This results in a very large vector, which uses a lot of memory.
A simple fix would be to use a map instead of a vector, in order to first check if an entry (rowi, colj) is already existing before adding a new value to this entry. But then, we would need to figure out how to collect the data and pass them to Eigen, without losing performance...
What do you think?
Seeing the
setFromTriplets()
method, I think that the solution is simply to use astd::list<Triplet<double>>
instead of astd::vector<Triplet<double>>
. In that case, noreserve
will be needed and any new insertion will not "erase" or "move" the previous entries.Since
setFromTriplets()
just need to iterate through your container, you must choose the one that has a constant time of insertion, which is not the case ofstd::vector<>
You can also test
std::deque<>
etstd::forward_list<>
If you want to keep a
vector<>
(vector
will be a little bit faster but requires the approximation of the final size with very bad consequences in terms of CPU time of memory if it's badly guessed), use a better approximation of the size:The size is not proportional to n * n.
n * n / 1000 is not enough for small meshes and far too much for very large meshes.
A better approximation is n * (number of nodes neighbouring a node in the mesh) = n * (number of elements neighbouring a node in the mesh) * (number of nodes per element-1) = n * 6 * 2, if we assume that the average number of elements neighbouring a node is a triangle mesh is 6.
If you want to be safe, use 7 or 8. You can also add a constant (1000?) for small meshes.
But I really prefer the use of a better container which avoids this guess. In Metafor we use
std::list
and it is very fast whatever the size of the mesh. I see that C++11 defines a "new"forward_list
if backward iteration is not necessary, which saves a pointer per element.Edited by Boman RomainThanks for
std::list
,std::deque
andstd::forward_list
, I have not thought of that!Regarding the estimate of the size, I know it does not scale as n^2 (it would make no sense), but n^2/1000 was the best simple approximation I could make quickly for time consuming problems (3D). That's also why I did not change the scaling in the other solvers and made that comment.
Actually, for the specific case of flow, the scaling would be even harder to estimate because of the upwinding and Kutta condition...In the end, I do agree with you that keeping
std::vector
is not the way to go.
added cleaning enhancement labels
added 1 commit
- 7ad50d31 - Update MKL init. Add Pardiso to benchmark. Add msys2 cmake file.
The code can be built successfully on macOS but fails at runtime:
rboman@spirou:~/dev/WAVES/waves$ python run.py --nogui fwk/tests/timers.py [findbins] looking for linux/mingw libs Fatal Python error: PyThreadState_Get: no current thread Abort trap: 6
On windows, with Visual Studio, the compilation of the SWIG interface generates an error:
f:\local\eigen\eigen\src\core\matrix.h(296): error C2664: 'void Eigen::PlainObjectBase<Eigen::Matrix<d ouble,3,1,0,3,1>>::_init1<T>(const double *)' : impossible de convertir l'argument 1 de 'const T' en 'Ei gen::EigenBase<Derived>::Index' [F:\dev\waves\build\tbox\_src\_tboxw.vcxproj]
I'm trying to understand why