joblib parallel for with graph-tool filtering?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

joblib parallel for with graph-tool filtering?

Tasos
With graph-tool and joblib working together, do we need to send graph.copy()
in the "Parallel" call like in the code below when using graph vertex
filtering with .set_vertex_filter? graph.copy() makes memory usage extreme
in large graphs (2M Vs, 4M Es) but in my head ensures any concurrency
problems. (or 'graph' without '.copy()' is ok?)

What is the best way to run parallel graph searches and filtering (different
vertex per thread) with graph-tool and joblib? (or without joblib)


###
    # defined and filled earlier
g_graph = graph_tool.Graph(directed=False)
eprop_ang = g_graph.new_edge_property("float")

###
from joblib import Parallel, delayed
import multiprocessing
import os
import tempfile
import shutil
import datetime

path2 = tempfile.mkdtemp()
out_path2 = os.path.join(path2,'z6path_out2.mmap')
out2 = np.memmap(out_path2, dtype=np.float32,
shape=(g_graph.num_vertices(),dims), mode='w+')

num_cores = 30
num_pre_workers = 60

def runparallel(graph, row, out2):
    dist, pred = graph_tool.search.dijkstra_search(graph, graph.vertex(row),
weight=eprop_ang)
    ## etc etc
    #####

    v_filter = graph.new_vertex_property('bool',val=False)
    for v in SOMETHING_LOCAL:
        v_filter[v] = True
    graph.set_vertex_filter(v_filter)
    # do something with the filtered 'graph' (subgraph)
    # and save output to out2
    out4[row] = RESULT
    ##
    graph.clear_filters()


Parallel(n_jobs=num_cores, pre_dispatch=num_pre_workers,
verbose=1)(delayed(runparallel)(g_graph.copy(), r, out2) for r in
range(g_graph.num_vertices()))




--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: joblib parallel for with graph-tool filtering?

Tiago Peixoto
Administrator
Am 28.04.2018 um 16:26 schrieb Tasos:
> What is the best way to run parallel graph searches and filtering (different
> vertex per thread) with graph-tool and joblib? (or without joblib)

The best approach is to create a different GraphView object for each
filtering, instead of setting the filter for the main graph. Read about
GraphViews here:

    https://graph-tool.skewed.de/static/doc/quickstart.html#graph-views

Best,
Tiago
--
Tiago de Paula Peixoto <[hidden email]>
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|

Re: joblib parallel for with graph-tool filtering?

cmos
Hi, I have the same question. Upon running code attempting to use GraphViews,
I get an error during pickling:

error: 'i' format requires -2147483648 <= number <= 2147483647

More specifically, it looks like a line inside joblib is unhappy:
CustomizablePickler(buffer, self._reducers).dump(obj)

And this takes us to a struct packing line: header = struct.pack("!i", n)


So, if I had to guess, I'd suspect joblib is trying to pickle the whole
graph rather than the GraphView reference, or something like this. Was
either of you able to get code to successfully parallelize using GraphViews
to avoid copying?



This is on python 3.6.5 with graph_tool version '2.26 (commit , )' , joblib
version '0.11'

Below is a minimal breaking example, if it helps. I am also happy to provide
other information such as tracebacks.

def toy_func(g):
    return g.vertex_properties['skim'][0][1]

vmr = [0, 1]
g = load_graph(path) # 22,000 vertex directed graph (a road network)
skim_table = shortest_distance(g, weights=g.edge_properties["weight"])
g.properties['skim'] = skim_table
p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))



--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: joblib parallel for with graph-tool filtering?

Tiago Peixoto
Administrator
Am 16.07.2018 um 03:34 schrieb cmos:

> Hi, I have the same question. Upon running code attempting to use GraphViews,
> I get an error during pickling:
>
> error: 'i' format requires -2147483648 <= number <= 2147483647
>
> More specifically, it looks like a line inside joblib is unhappy:
> CustomizablePickler(buffer, self._reducers).dump(obj)
>
> And this takes us to a struct packing line: header = struct.pack("!i", n)
>
>
> So, if I had to guess, I'd suspect joblib is trying to pickle the whole
> graph rather than the GraphView reference, or something like this. Was
> either of you able to get code to successfully parallelize using GraphViews
> to avoid copying?

It is impossible to say anything, without a minimal and self-contained
example that shows the problem.


> Below is a minimal breaking example, if it helps. I am also happy to provide
> other information such as tracebacks.
>
> def toy_func(g):
>     return g.vertex_properties['skim'][0][1]
>
> vmr = [0, 1]
> g = load_graph(path) # 22,000 vertex directed graph (a road network)
> skim_table = shortest_distance(g, weights=g.edge_properties["weight"])
> g.properties['skim'] = skim_table
> p(joblib.delayed(toy_func)(GraphView(g)) for i in range(10))


That is not a complete minimal example; the function 'p' is undefined and
there are other errors. Please provide one that actually runs, and does not
depend on external data.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
--
Tiago de Paula Peixoto <tiago@skewed.de>