Brief background: I am using Bayesian inference methods like MCMC or HMC. As of now I had been using emcee
package to infer model parameters using MCMC. The primary reason why I was using emcee
was because it allows MPI to parallelize the walkers that sample the posterior distributions. Thus, each walker runs on a separate core. A simple example of linear regression using emcee
can be found in the link: https://emcee.readthedocs.io/en/stable/tutorials/line/
The issue: For my specific problem, I need to solve an eigenvalue problem and compare the eigenvalues with the data for each MCMC step. The matrix M whose eigenvalues we have to find is a complex Hermitian matrix of average size (2500 x 2500). Each walker working on a separate core has to solve this eigenvalue problem. I can use numpy.linalg.eigh() or spicy.linalg.eigh() or scipy.sparse.linalg.eigsh() as the eigenvalue solver and I know that each of them uses all the available cores when not running via an MPI implementation. But, since I am parallelizing the walkers, each walker gets restricted to a single core and so the eigenvalue solvers do not use all the other cores (even if I am providing more cores than the number of walkers).
My questions: Answering any or all of the questions below would be of great help.
- How can I solve this issue?
- Is there a way this can be done on a GPU to speed up the calculation?
- Has anyone used MCMC or HMC to solve such a problem where multiple walkers are used and an eigenvalue problem needs to be solved?
My tests of runtimes of eigenvalue problems can be found in the GitHub repository for benchmarking the different eigenvalue problem solvers which can be accessed in the link: https://github.com/srijaniiserprinceton/test_eigprob