Questions on Parallelisation

classic Classic list List threaded Threaded
2 messages Options
P-M
Reply | Threaded
Open this post in threaded view
|

Questions on Parallelisation

P-M
Hello Tiago,

I am trying to outsource some of my calculations to the Univeristy's cluster
as the compute times for some of my datasets are getting very lengthy. I had
a couple of questions arising out of this and was wondering whether you had
any thoughts/advice on them:

1. The University normally limits runtimes of a given job to 12h and
suggests checkpointing to get around this so that the calculation can simply
resume as a new job. Apparently the cluster has "DMTCP: Distributed
MultiThreaded CheckPointing" installed in order to allow checkpointing "for
some applications which do not have their own support for this". Is this
compatible with graph-tool or are you aware of any other ways in which I
could stop and restart a job? (What I am after is stopping and restarting a
graph-tool function which takes more than 12h to complete as I can obviously
pickle results of a calculation straightforwardly already.) If there is no
way of currently doing this is this something you might consider doing at
some point in the future?

2. As far as I understand using OpenMP I can only ever use a single node at
a time and am thus limited by how many CPUs and how much RAM this node
supplies. To be able to use more CPUs I would need to use e.g. MPI. Are
there any plans to implement this at some point? (I realise this may not be
straightforward but was interested in your thoughts.)

3. I would sometimes find it useful to be able to get a progress update for
some of the functions to carry out a rough order-of-magnitude estimate of
required compute time. Two use cases:
    a) If calculating the betweenness centrality for a large network this
can take a long time. Having an idea of how many nodes have been covered
already would be useful to extrapolate time roughly remaining to see whether
the calculation is even feasible in the time available.
    b) If collecting data using mcmc_equilibrate it would be potentially
less meaningful as the process is stochastic, however, information on how
many sweeps have been completed would still be useful to give a rough
estimate of whether completion will take hours, days, weeks, or months.

Best wishes,

Philipp



--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: Questions on Parallelisation

Tiago Peixoto
Administrator
Am 19.06.2018 um 17:29 schrieb P-M:

> Hello Tiago,
>
> I am trying to outsource some of my calculations to the Univeristy's cluster
> as the compute times for some of my datasets are getting very lengthy. I had
> a couple of questions arising out of this and was wondering whether you had
> any thoughts/advice on them:
>
> 1. The University normally limits runtimes of a given job to 12h and
> suggests checkpointing to get around this so that the calculation can simply
> resume as a new job. Apparently the cluster has "DMTCP: Distributed
> MultiThreaded CheckPointing" installed in order to allow checkpointing "for
> some applications which do not have their own support for this". Is this
> compatible with graph-tool or are you aware of any other ways in which I
> could stop and restart a job? (What I am after is stopping and restarting a
> graph-tool function which takes more than 12h to complete as I can obviously
> pickle results of a calculation straightforwardly already.) If there is no
> way of currently doing this is this something you might consider doing at
> some point in the future?

DMTCP is the right solution to this problem, and it should work. I use it
myself without any issues.

> 2. As far as I understand using OpenMP I can only ever use a single node at
> a time and am thus limited by how many CPUs and how much RAM this node
> supplies. To be able to use more CPUs I would need to use e.g. MPI. Are
> there any plans to implement this at some point? (I realise this may not be
> straightforward but was interested in your thoughts.)

There are no plans to implement MPI in graph-tool. It would require a major
redesign of the algorithms, and I have not interest in doing this.

> 3. I would sometimes find it useful to be able to get a progress update for
> some of the functions to carry out a rough order-of-magnitude estimate of
> required compute time. Two use cases:
>     a) If calculating the betweenness centrality for a large network this
> can take a long time. Having an idea of how many nodes have been covered
> already would be useful to extrapolate time roughly remaining to see whether
> the calculation is even feasible in the time available.

It would be feasible to implement this, but I'd rather keep the code simple.
It would also interfere with OpenMP, since the progress would need to be
computed for each thread and but reported synchronously. A lot of work, for
just a minor convenience of having a progress bar... Unless there is a very
elegant way of doing this, I'd rather not.

>     b) If collecting data using mcmc_equilibrate it would be potentially
> less meaningful as the process is stochastic, however, information on how
> many sweeps have been completed would still be useful to give a rough
> estimate of whether completion will take hours, days, weeks, or months.

This you can get by simply passing "verbose=True".

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
--
Tiago de Paula Peixoto <tiago@skewed.de>