do_djk_search without distance initialization

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

do_djk_search without distance initialization

François Kawala
Hello, 

I try to dig up an old idea that we've discussed previously (the thread is here). 

summary: when one runs shortest distance on a graph with several million vertices the distance initialization cost is high with respect to the actual Dijsktra cost. We want to circumvent the initialization [see L326 of src/graph/topology/graph_distance.cc]

The proposed approach (as per this boost thread) is to use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance initialization is done "on the fly" by the `examine_edge` function. More precisely, the Dijkstra visitor maintains a set of discovered vertices. This set is read in the `examine_edge` function to set the distance to infinity when a the target vertex does not belong to the set of discovered vertices. The `examine_edge` function is also responsible to update the  set of discovered vertices.

As proposed in the initial thread, we would like to add a init-or-not switch to the shortest_distance parameters (see L1568 of /src/graph_tool/topology/__init__.py).

I plan to : 
  • define a new Dijkstra visitor named `djk_max_visitor_no_init` (and maybe a second one to handle multiple targets scenario) 
  • update the `do_djk_search` function to use `dijkstra_shortest_paths_no_color_map_no_init`
  • update the python shortest_distance (add the init-or-not switch)
How does it sounds ? Am I missing something ? 

I have two side questions : 
  • what is the "standard" development environment ? I tried to use CLION but I cannot figure out how to configure it properly so that all the boost dependencies are correctly resolved. 
  • how do you debug during development ?
Best regards, 
François.


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
On 28.06.2017 15:00, François Kawala wrote:

> Hello,
>
> I try to dig up an old idea that we've discussed previously (the thread is
> here
> <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>).
>
> summary: when one runs shortest distance on a graph with several million
> vertices the distance initialization cost is high with respect to the actual
> Dijsktra cost. We want to circumvent the initialization [see L326 of
> src/graph/topology/graph_distance.cc
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>]
>
> The proposed approach (as per this boost thread
> <https://groups.google.com/forum/#!topic/boost-list/7nCzvkkEXCk>) is to use
> the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
> initialization is done "on the fly" by the `examine_edge` function. More
> precisely, the Dijkstra visitor maintains a set of discovered vertices. This
> set is read in the `examine_edge` function to set the distance to infinity
> when a the target vertex does not belong to the set of discovered vertices.
> The `examine_edge` function is also responsible to update the  set of
> discovered vertices.
>
> As proposed in the initial thread, we would like to add a init-or-not switch
> to the shortest_distance parameters (see L1568 of
> /src/graph_tool/topology/__init__.py
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1568>).
>
> I plan to :
>
>   * define a new Dijkstra visitor named `djk_max_visitor_no_init` (and maybe
>     a second one to handle multiple targets scenario)
>   * update the `do_djk_search` function to use
>     `dijkstra_shortest_paths_no_color_map_no_init`
>   * update the python shortest_distance (add the init-or-not switch)
>
> How does it sounds ? Am I missing something ?
I don't see the drawback of always using the no_init version, with a single
visitor that always does implicit initialization. The init=True version
would be redundant, but at almost no extra cost. If the extra cost is
important, the user should be using init=False anyways...

> I have two side questions :
>
>   * what is the "standard" development environment ? I tried to use CLION
>     <https://www.jetbrains.com/clion/> but I cannot figure out how to
>     configure it properly so that all the boost dependencies are correctly
>     resolved.

There is no such thing... I just use a text editor (emacs).

>   * how do you debug during development ?

GDB, valgrind, etc.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
In reply to this post by François Kawala
On 28.06.2017 15:00, François Kawala wrote:

>
> I try to dig up an old idea that we've discussed previously (the thread is
> here
> <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>).
>
> summary: when one runs shortest distance on a graph with several million
> vertices the distance initialization cost is high with respect to the actual
> Dijsktra cost. We want to circumvent the initialization [see L326 of
> src/graph/topology/graph_distance.cc
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>]
>
> The proposed approach (as per this boost thread
> <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is to
> use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
> initialization is done "on the fly" by the `examine_edge` function. More
> precisely, the Dijkstra visitor maintains a set of discovered vertices. This
> set is read in the `examine_edge` function to set the distance to infinity
> when a the target vertex does not belong to the set of discovered vertices.
> The `examine_edge` function is also responsible to update the  set of
> discovered vertices.
I've pushed a commit:

   https://git.skewed.de/count0/graph-tool/commit/bf174831

that replaces the search functions by no_init ones. The initialization is
still done, but only once via a single numpy call in the Python side of things.

The speed of the numpy call should be comparable to what we get with
allocation, which is unavoidable.

Please see if the performance improvements you observe with this is acceptable.

Best,
Tiago


--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
Hello Tiago, 

Thanks a lot for swift update. I timed the 2.22 against commit bf174831 (compiled with CXXFLAGS=-O3). That does not give an accurate drill 

On my typical ~7e6 vertices graph, for a typical search that reach about 1e5 vertices, the improvement is about 20%. We have 83ms (average over 100 runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly, the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release.

What I understand from the boost thread is that we could have only initialization for the whole lifespan of the program. If I get it right, their proposal would, prior to a search, to "re-set" the values from distances vector and predecessors vector only for the reached vertices of the last search. I think it worth a try because, from what I timed the python side of initialization could cost around 18ms (5ms for distance map, 13ms for predecessor map) see timeit snipet below.

import timeit
# pred map
timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)', number=1000)/1000
# distances
timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)', number=1000)/1000

I did the above timings on a modern work station, to switch to a cloud provider induces a non negligible overhead (+63% for the distances / +469% for the pred map).

What do you think ?

F.



Le mer. 28 juin 2017 à 18:58, Tiago de Paula Peixoto <[hidden email]> a écrit :
On 28.06.2017 15:00, François Kawala wrote:
>
> I try to dig up an old idea that we've discussed previously (the thread is
> here
> <http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html>).
>
> summary: when one runs shortest distance on a graph with several million
> vertices the distance initialization cost is high with respect to the actual
> Dijsktra cost. We want to circumvent the initialization [see L326 of
> src/graph/topology/graph_distance.cc
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326>]
>
> The proposed approach (as per this boost thread
> <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is to
> use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
> initialization is done "on the fly" by the `examine_edge` function. More
> precisely, the Dijkstra visitor maintains a set of discovered vertices. This
> set is read in the `examine_edge` function to set the distance to infinity
> when a the target vertex does not belong to the set of discovered vertices.
> The `examine_edge` function is also responsible to update the  set of
> discovered vertices.

I've pushed a commit:

   https://git.skewed.de/count0/graph-tool/commit/bf174831

that replaces the search functions by no_init ones. The initialization is
still done, but only once via a single numpy call in the Python side of things.

The speed of the numpy call should be comparable to what we get with
allocation, which is unavoidable.

Please see if the performance improvements you observe with this is acceptable.

Best,
Tiago


--
Tiago de Paula Peixoto <[hidden email]>

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
On 29.06.2017 11:58, François Kawala wrote:
> On my typical ~7e6 vertices graph, for a typical search that reach about 1e5
> vertices, the improvement is about 20%. We have 83ms (average over 100 runs)
> for the 2.22 release versus 69ms for commit bf174831. Interestingly, the
> bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release.

I wonder how much time is spent in the actual search vs. initialization. If
you use something like cProfile you can see how much time is spent in the
actual C++ function vs. the rest. This would give us an idea of how much
actual improvement can be done.

> What I understand from the boost thread
> <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk> is that
> we could have only initialization for the whole lifespan of the program. If
> I get it right, their proposal would, prior to a search, to "re-set" the
> values from distances vector and predecessors vector only for the reached
> vertices of the last search. I think it worth a try because, from what I
> timed the python side of initialization could cost around 18ms (5ms for
> distance map, 13ms for predecessor map) see timeit snipet below.

In order for this to work, the function would need to keep and return a list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
In reply to this post by François Kawala
On 29.06.2017 11:58, François Kawala wrote:
> I did the above timings on a modern work station, to switch to a cloud
> provider induces a non negligible overhead (+63% for the distances / +469%
> for the pred map).

This I don't understand at all. Why would the behavior in a "cloud"
environment be any different?

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
About the run time in cloud computing scenario my measure might be useless. A cloud instance is a VM that runs on I don't know what hardware and might be shared with I don't know who. So even if inside my VM the CPU was idle, I don't now how the CPU was used by other VMs. That being said, for our particular cloud service provider we observe (on a daily basis for several months) that the performances are below what we have on our modern workstations. 

Just to be clear : I'm neither complaining about graph tool performances, nor thinking about improvements that would be cloud computing driven. I just wanted to make clear that modest improvement (as measured on a modern work station) might be very meaningful in another environment. 

In order to profile the initialization I moved the initialization code in two functions init_pmap and init_dist, and did 100 runs of shortest_distance with identical parameters, cProfiles gives : 

23484 function calls (23331 primitive calls) in 6.733 seconds
Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
100     3.723    0.037    6.731    0.067   graph_tool/topology/__init__.py:1512(shortest_distance)
100     0.001    0.000    1.833    0.018   graph_tool/topology/__init__.py:1507(init_pmap)
100     0.000    0.000    1.833    0.018   graph_tool/decorators.py:1(copy_property)
200/100 0.003    0.000    1.832    0.018   graph_tool/decorators.py:126(wrapper)
100     1.275    0.013    1.827    0.018   graph_tool/__init__.py:2348(copy_property)
100     0.001    0.000    1.163    0.012   graph_tool/topology/__init__.py:1500(init_dist)
100     0.000    0.000    0.645    0.006   graph_tool/__init__.py:661(<lambda>)
100     0.643    0.006    0.644    0.006   graph_tool/__init__.py:614(__get_set_f_array)
700     0.001    0.000    0.540    0.001   graph_tool/__init__.py:166(_prop)
700     0.003    0.000    0.538    0.001   graph_tool/__init__.py:392(_get_any)
700     0.534    0.001    0.534    0.001   graph_tool/__init__.py:814(reserve)
200     0.000    0.000    0.518    0.003   graph_tool/__init__.py:559(get_array)
200     0.515    0.003    0.517    0.003   graph_tool/__init__.py:581(_get_data)
200     0.004    0.000    0.011    0.000   graph_tool/__init__.py:2289(new_vertex_property)
200     0.002    0.000    0.010    0.000   graph_tool/__init__.py:3071(__init__)
600     0.002    0.000    0.008    0.000   graph_tool/__init__.py:377(__init__)
200     0.002    0.000    0.007    0.000   graph_tool/__init__.py:1531(__init__)
100     0.000    0.000    0.006    0.000   graph_tool/__init__.py:2274(new_property)
800     0.002    0.000    0.004    0.000   graph_tool/__init__.py:202(_type_alias)
599     0.001    0.000    0.003    0.000   graph_tool/__init__.py:437(__del__)
600     0.001    0.000    0.002    0.000   graph_tool/__init__.py:264(_converter)
599     0.002    0.000    0.002    0.000   graph_tool/__init__.py:432(__unregister_map)
200     0.001    0.000    0.002    0.000   graph_tool/__init__.py:909(__new__)

The complete cprofile output is attached to this mail.

Le jeu. 29 juin 2017 à 13:16, Tiago de Paula Peixoto <[hidden email]> a écrit :
On 29.06.2017 11:58, François Kawala wrote:
> I did the above timings on a modern work station, to switch to a cloud
> provider induces a non negligible overhead (+63% for the distances / +469%
> for the pred map).

This I don't understand at all. Why would the behavior in a "cloud"
environment be any different?

--
Tiago de Paula Peixoto <[hidden email]>

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

profile (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
> we could have only **one** initialization for the whole lifespan of the program. If

> I get it right, their proposal would, prior to a search, to "re-set" the
> values from distances vector and predecessors vector only for the reached
> vertices of the last search. I think it worth a try because, from what I
> timed the python side of initialization could cost around 18ms (5ms for
> distance map, 13ms for predecessor map) see timeit snipet below.

In order for this to work, the function would need to keep and return a list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...

To elaborate on this, a quick an dirty profiling (with the timeit module) gives 0.27 ms to reset 10e5 random items to np.inf in a numpy array with 7e6 items.

I'm confident that this improvement worth to be done and I can try to do it. 

Once initialized, the distance map, the predecessor map, and the "last reached index" would be internal property maps, correct ? Would it be relevant to do the "re-set" in the CPP part ? 

François

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
I realize that I'm not familiar enough with boost to do this change. 

From what I get, I'll add three private members : _distance and _predecessor they would be initialized as follows : 

     _distance(get(vertex_distance, *_mg)),
     _predecessor(get(predecessor, *_mg)),

I don't know if the _distance member should be filled with zeros ? 

The last private member would be _reach a  std::vector<size_t>

From that I'll declare two functions : 

     set_reached to update the reached vertices
     reset_distance to reset the _distance, _predecessor and _reach to their default values.

Does that sounds right ? If so, I'll guess that I'll have to make the do_djk_search function to call the set_reached and reset_distance functions.

Am I missing something ? 

Sorry for this stuttering approach. 

François.

 


Le jeu. 29 juin 2017 à 18:22, François Kawala <[hidden email]> a écrit :
> we could have only **one** initialization for the whole lifespan of the program. If

> I get it right, their proposal would, prior to a search, to "re-set" the
> values from distances vector and predecessors vector only for the reached
> vertices of the last search. I think it worth a try because, from what I
> timed the python side of initialization could cost around 18ms (5ms for
> distance map, 13ms for predecessor map) see timeit snipet below.

In order for this to work, the function would need to keep and return a list
of reached vertices. This is is not difficult to do, but I wonder how much
it would actually improve...

To elaborate on this, a quick an dirty profiling (with the timeit module) gives 0.27 ms to reset 10e5 random items to np.inf in a numpy array with 7e6 items.

I'm confident that this improvement worth to be done and I can try to do it. 

Once initialized, the distance map, the predecessor map, and the "last reached index" would be internal property maps, correct ? Would it be relevant to do the "re-set" in the CPP part ? 

François

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
On 30.06.2017 18:44, François Kawala wrote:

> I realize that I'm not familiar enough with boost to do this change.
>
> From what I get, I'll add three private members : _distance and _predecessor
> they would be initialized as follows :
>
>      _distance(get(vertex_distance, *_mg)),
>      _predecessor(get(predecessor, *_mg)),
>
> I don't know if the _distance member should be filled with zeros ?
>
> The last private member would be _reach a  std::vector<size_t>
>
> From that I'll declare two functions :
>
>      set_reached to update the reached vertices
>      reset_distance to reset the _distance, _predecessor and _reach to their
> default values.
>
> Does that sounds right ? If so, I'll guess that I'll have to make
> the do_djk_search function to call the set_reached and reset_distance functions.
>
> Am I missing something ?
>
> Sorry for this stuttering approach.
I've just pushed a commit that implements what you want; you can look inside
for the details.

In short, you can now do multiple searches as follows:

    dist_map = g.new_vp("double", numpy.inf)  # must be infinity
    pred_map = g.vertex_index.copy()          # must be identity

    for source in sources:

        # the following does not initialize dist_map and pred_map, and
        # returns an array of reached vertices

        dist_map, pred_map, reached = \
            shortest_distance(g, source, weights=w, pred_map=pred_map,
                              dist_map=dist_map, return_reached=True)

        # reset property maps for next search; this is O(len(reached))

        dist_map.a[reached] = numpy.inf
        pred_map.a[reached] = reached

Please tell me if this brings the improvements you were seeking.

Best,
Tiago





_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I sample 100 vertices and do shortest_distance from each one. I specify no target, and set the max_dist parameter. After each call to shortest_distance, I use the reached array to reset the predecessor map. In average each search reaches 120177 vertices.
  • 2.22 : 9.101 seconds
  • commit 26404ef4 :  4.536 seconds
  • commit 26404ef4 + no dist map init : 4.141 seconds
That more than twice faster, great news ! 

About the third configuration (no dist map init), I removed the distance map initialization (graph_tool/topology/__init__.py L1779) and used the reached array to reset dist_map to infinity.

We could have a do_initialization parameter, I think it would be more explicit, see my proposal.

François.




Le ven. 30 juin 2017 à 21:59, Tiago de Paula Peixoto <[hidden email]> a écrit :
On 30.06.2017 18:44, François Kawala wrote:
> I realize that I'm not familiar enough with boost to do this change.
>
> From what I get, I'll add three private members : _distance and _predecessor
> they would be initialized as follows :
>
>      _distance(get(vertex_distance, *_mg)),
>      _predecessor(get(predecessor, *_mg)),
>
> I don't know if the _distance member should be filled with zeros ?
>
> The last private member would be _reach a  std::vector<size_t>
>
> From that I'll declare two functions :
>
>      set_reached to update the reached vertices
>      reset_distance to reset the _distance, _predecessor and _reach to their
> default values.
>
> Does that sounds right ? If so, I'll guess that I'll have to make
> the do_djk_search function to call the set_reached and reset_distance functions.
>
> Am I missing something ?
>
> Sorry for this stuttering approach.

I've just pushed a commit that implements what you want; you can look inside
for the details.

In short, you can now do multiple searches as follows:

    dist_map = g.new_vp("double", numpy.inf)  # must be infinity
    pred_map = g.vertex_index.copy()          # must be identity

    for source in sources:

        # the following does not initialize dist_map and pred_map, and
        # returns an array of reached vertices

        dist_map, pred_map, reached = \
            shortest_distance(g, source, weights=w, pred_map=pred_map,
                              dist_map=dist_map, return_reached=True)

        # reset property maps for next search; this is O(len(reached))

        dist_map.a[reached] = numpy.inf
        pred_map.a[reached] = reached

Please tell me if this brings the improvements you were seeking.

Best,
Tiago




_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
On 03.07.2017 12:16, François Kawala wrote:
> I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I
> sample 100 vertices and do shortest_distance from each one. I specify no
> target, and set the max_dist parameter. After each call to
> shortest_distance, I use the reached array to reset the predecessor map. In
> average each search reaches 120177 vertices.
>
>   * 2.22 : 9.101 seconds
>   * commit 26404ef4 :  4.536 seconds
>   * commit 26404ef4 + no dist map init : 4.141 seconds

Well, the last improvement is only about 9%... We're clearly seeing
diminishing returns. As I had expected, numpy initialization is pretty fast.

> About the third configuration (no dist map init), I removed the distance map
> initialization (graph_tool/topology/__init__.py L1779
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1779>)
> and used the reached array to reset dist_map to infinity.

Did you do the same with the predecessor map as well?

> We could have a do_initialization parameter, I think it would be more
> explicit, see my proposal
> <https://git.skewed.de/francois/graph-tool/commit/946826546a80f1f080681a4eaaa5fb2590b5abd4>.

I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

François Kawala
Le lun. 3 juil. 2017 à 13:00, Tiago de Paula Peixoto <[hidden email]> a écrit :
On 03.07.2017 12:16, François Kawala wrote:
> I cProfiled old versus your last commit. The graph has 7 953 158 vertices, I
> sample 100 vertices and do shortest_distance from each one. I specify no
> target, and set the max_dist parameter. After each call to
> shortest_distance, I use the reached array to reset the predecessor map. In
> average each search reaches 120177 vertices.
>
>   * 2.22 : 9.101 seconds
>   * commit 26404ef4 :  4.536 seconds
>   * commit 26404ef4 + no dist map init : 4.141 seconds

Well, the last improvement is only about 9%... We're clearly seeing
diminishing returns. As I had expected, numpy initialization is pretty fast.

Yes, numpy does its job quite well, still 9% is non-negligible to me.
 
> About the third configuration (no dist map init), I removed the distance map
> initialization (graph_tool/topology/__init__.py L1779
> <https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/topology/__init__.py#L1779>)
> and used the reached array to reset dist_map to infinity.

Did you do the same with the predecessor map as well?


Yep (see above).
 
> We could have a do_initialization parameter, I think it would be more
> explicit, see my proposal
> <https://git.skewed.de/francois/graph-tool/commit/946826546a80f1f080681a4eaaa5fb2590b5abd4>.

I don't like this very much. The list of reached vertices might be useful
even if the user is not interested in the no_init optimization, so I think
it is better decoupled. Furthermore, you are ignoring the predecessor map,
which will give a cryptic error if the user chooses do_initialization=False
and forgets to pass pred_map, and will ignore the value passed by the user
otherwise.


I agree. We could split the function into "shortest_distance" and "shortest_distance_no_init". But, if I'm the sole user of this function it'll be much more sensible that I implement it my projet.

Another related question, why it is necessary to copy the reached array ?

Regards,
François
 
--
Tiago de Paula Peixoto <[hidden email]>

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: do_djk_search without distance initialization

Tiago Peixoto
Administrator
On 03.07.2017 14:06, François Kawala wrote:
>
> Another related question, why it is necessary to copy the reached array ?

Because ownership issues. The function works with a std::vector<>
internally, because numpy's ndarrays cannot be dynamically increased. We
could return a wrapped std::vector (called Vector_size_t internally in
graph-tool), but these are not documented. So instead, the std::vector's
contents are copied as a ndarray.

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Loading...