efficiency of graph creation

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

efficiency of graph creation

thekswenson
I've been using networkx to simply create a graph and check the connected components.  The bottleneck of the operation is the creation of the edges.

I've heard that graph-tool is very efficient so I've replaces the code with a graph-tool graph.
To my surprise, the creation of a graph-tool graph is MUCH slower than that of a networkx graph.

Am I doing something wrong?

I've created a small sample program (at the bottom of this message) to test this speed...
  it creates a graph in a very natural way:  pairs of objects that should be connected by an edge are iterated through.  We happen to give every possible pair of objects in this example so the complete graph is created.

Here is runsnake image of where the running-time is going:

the runsnake comparison

The Code:

#!/usr/bin/python
"""
Create graphs in networkx and graph-tool.
"""

import networkx as nx
from graph_tool.all import *

from itertools import combinations


def graph_tool_create():
  """ Create a graph_tool graph given a list of pairs. """
  G = Graph(directed=False)      
  objectset = set()
  for o1,o2 in get_pairs_of_objects():
    if(o1 not in objectset):
      u = G.add_vertex()
      objectset.add(o1)
    if(o2 not in objectset):
      v = G.add_vertex()
      objectset.add(o2)

    G.add_edge(u,v)


def nx_create():
  """ Create a graph_tool graph given a list of pairs. """
  G = nx.Graph()      
  for o1,o2 in get_pairs_of_objects():
    G.add_edge(o1,o2)


def get_pairs_of_objects():
  """ Generate pairs of objects.  """
  n = 5000
  for a,b in combinations(range(n),2):
    yield a,b


graph_tool_create()
nx_create()
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

thekswenson
I posted code with missing lines...
  here is the good code:

#!/usr/bin/python
"""
Create graphs in networkx and graph-tool.
"""

import networkx as nx
from graph_tool.all import *
import igraph

from itertools import combinations


def graph_tool_create():
  """ Create a graph_tool graph given a list of pairs. """
  G = Graph(directed=False)
  objectTOv = {}
  for o1,o2 in get_pairs_of_objects():
    if(o1 in objectTOv):
      u = objectTOv[o1]
    else:
      u = G.add_vertex()
      objectTOv[o1] = u
    if(o2 in objectTOv):
      v = objectTOv[o2]
    else:
      v = G.add_vertex()
      objectTOv[o2]

    G.add_edge(u,v)


def nx_create():
  """ Create a graph_tool graph given a list of pairs. """
  G = nx.Graph()
  for o1,o2 in get_pairs_of_objects():
    G.add_edge(o1,o2)


def get_pairs_of_objects():
  """ Generate pairs of objects.  """
  n = 3000
  for a,b in combinations(range(n),2):
    yield a,b


graph_tool_create()
nx_create()
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

Tiago Peixoto
Administrator
In reply to this post by thekswenson
On 27.04.2015 14:29, thekswenson wrote:
> I've been using networkx to simply create a graph and check the connected
> components.  The bottleneck of the operation is the creation of the edges.
>
> I've heard that graph-tool is very efficient so I've replaces the code with
> a graph-tool graph.
> To my surprise, the creation of a graph-tool graph is MUCH slower than that
> of a networkx graph.
>
> Am I doing something wrong?

How does the performance change if you create the necessary edges
beforehand?

In graph-tool things are faster than in networkx when they are delegated
to C++, otherwise this should be comparable in speed. In the case of
adding many edges, this is done by using the Graph.add_edge_list()
function, which runs in C++ internally. In your example, this should
provide a massive speed-up.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (836 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

thekswenson
Thanks for the quick response Thiago!

In this code all the edges and vertices are created by graph-tool and the result is something much faster...
   is this the best I can do?

It's somewhat annoying to have to keep track of the vertices that will be created like this:

def graph_tool_create_all_at_once():
  """ Create a graph_tool graph given a list of pairs. """
  G = Graph(directed=False)
  objectTOi = {}
  vertexpairs = []
  counter = 0
  for o1,o2 in get_pairs_of_ints():
    if(o1 in objectTOi):
      u = objectTOi[o1]
    else:
      u = counter
      counter += 1
      objectTOi[o1] = u
    if(o2 in objectTOi):
      v = objectTOi[o2]
    else:
      v = counter
      counter += 1
      objectTOi[o2] = v

    vertexpairs.append((u,v))

  G.add_edge_list(vertexpairs)


On 27 April 2015 at 16:39, Tiago de Paula Peixoto <[hidden email]> wrote:
On 27.04.2015 14:29, thekswenson wrote:
> I've been using networkx to simply create a graph and check the connected
> components.  The bottleneck of the operation is the creation of the edges.
>
> I've heard that graph-tool is very efficient so I've replaces the code with
> a graph-tool graph.
> To my surprise, the creation of a graph-tool graph is MUCH slower than that
> of a networkx graph.
>
> Am I doing something wrong?

How does the performance change if you create the necessary edges
beforehand?

In graph-tool things are faster than in networkx when they are delegated
to C++, otherwise this should be comparable in speed. In the case of
adding many edges, this is done by using the Graph.add_edge_list()
function, which runs in C++ internally. In your example, this should
provide a massive speed-up.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

Elliot
I solved this once by making a NoDupesGraph where you could add edges with just the names of vertices .

class NoDupesGraph(Graph):
    '''Add nodes without worrying if it is a duplicate.
       Add edges without worrying if nodes exist   '''

    def __init__(self,*args,**kwargs):
        Graph.__init__(self,*args,**kwargs)
        self._nodes = {}

    def add_nodupe_vertex(self,label,*args,**kwargs):
      '''Return a node with label. Create node if label is new'''
      try:
          n = self._nodes[label]
      except KeyError:
          n = self.add_vertex()
          self._nodes[label]=n
      return n

    def add_nodupe_edge(self, n1_label, n2_label,directed=False):
      """
      Get or create edges using get_or_create_node
      """
      #there may be two if graph is directed but edge isn't
      edges = []

      n1 = self.add_nodupe_vertex(n1_label)
      n2 = self.add_nodupe_vertex(n2_label)
      edges.append(self.add_edge(n1,n2))
      if self.is_directed() and not directed:
          edges.append(self.add_edge(n2,n1))
      return edges

    def flush_empty_nodes(self):
        '''not implemented'''
        pass

    def condense_edges(self):
        '''if a node connects to only two edges, combine those
        edges and delete the node.
       
        not implemented
        '''
        pass

This could be easily modified to suit your need.

On Mon, Apr 27, 2015 at 3:23 PM, Krister <[hidden email]> wrote:
Thanks for the quick response Thiago!

In this code all the edges and vertices are created by graph-tool and the result is something much faster...
   is this the best I can do?

It's somewhat annoying to have to keep track of the vertices that will be created like this:

def graph_tool_create_all_at_once():
  """ Create a graph_tool graph given a list of pairs. """
  G = Graph(directed=False)
  objectTOi = {}
  vertexpairs = []
  counter = 0
  for o1,o2 in get_pairs_of_ints():
    if(o1 in objectTOi):
      u = objectTOi[o1]
    else:
      u = counter
      counter += 1
      objectTOi[o1] = u
    if(o2 in objectTOi):
      v = objectTOi[o2]
    else:
      v = counter
      counter += 1
      objectTOi[o2] = v

    vertexpairs.append((u,v))

  G.add_edge_list(vertexpairs)


On 27 April 2015 at 16:39, Tiago de Paula Peixoto <[hidden email]> wrote:
On 27.04.2015 14:29, thekswenson wrote:
> I've been using networkx to simply create a graph and check the connected
> components.  The bottleneck of the operation is the creation of the edges.
>
> I've heard that graph-tool is very efficient so I've replaces the code with
> a graph-tool graph.
> To my surprise, the creation of a graph-tool graph is MUCH slower than that
> of a networkx graph.
>
> Am I doing something wrong?

How does the performance change if you create the necessary edges
beforehand?

In graph-tool things are faster than in networkx when they are delegated
to C++, otherwise this should be comparable in speed. In the case of
adding many edges, this is done by using the Graph.add_edge_list()
function, which runs in C++ internally. In your example, this should
provide a massive speed-up.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

thekswenson
Hi offonoffon...
  won't the creation of the vertices and edges one at a time end up dramatically affecting the performance of graph creation (see my original question at the top of the thread)?

I've found that creating the edges one at a time is MUCH slower than creating them all at once and that creating the vertices one at a time is a little slower.

What I have is not pretty though.

On 28 April 2015 at 01:11, ... <[hidden email]> wrote:
I solved this once by making a NoDupesGraph where you could add edges with just the names of vertices .

class NoDupesGraph(Graph):
    '''Add nodes without worrying if it is a duplicate.
       Add edges without worrying if nodes exist   '''

    def __init__(self,*args,**kwargs):
        Graph.__init__(self,*args,**kwargs)
        self._nodes = {}

    def add_nodupe_vertex(self,label,*args,**kwargs):
      '''Return a node with label. Create node if label is new'''
      try:
          n = self._nodes[label]
      except KeyError:
          n = self.add_vertex()
          self._nodes[label]=n
      return n

    def add_nodupe_edge(self, n1_label, n2_label,directed=False):
      """
      Get or create edges using get_or_create_node
      """
      #there may be two if graph is directed but edge isn't
      edges = []

      n1 = self.add_nodupe_vertex(n1_label)
      n2 = self.add_nodupe_vertex(n2_label)
      edges.append(self.add_edge(n1,n2))
      if self.is_directed() and not directed:
          edges.append(self.add_edge(n2,n1))
      return edges

    def flush_empty_nodes(self):
        '''not implemented'''
        pass

    def condense_edges(self):
        '''if a node connects to only two edges, combine those
        edges and delete the node.
       
        not implemented
        '''
        pass

This could be easily modified to suit your need.

On Mon, Apr 27, 2015 at 3:23 PM, Krister <[hidden email]> wrote:
Thanks for the quick response Thiago!

In this code all the edges and vertices are created by graph-tool and the result is something much faster...
   is this the best I can do?

It's somewhat annoying to have to keep track of the vertices that will be created like this:

def graph_tool_create_all_at_once():
  """ Create a graph_tool graph given a list of pairs. """
  G = Graph(directed=False)
  objectTOi = {}
  vertexpairs = []
  counter = 0
  for o1,o2 in get_pairs_of_ints():
    if(o1 in objectTOi):
      u = objectTOi[o1]
    else:
      u = counter
      counter += 1
      objectTOi[o1] = u
    if(o2 in objectTOi):
      v = objectTOi[o2]
    else:
      v = counter
      counter += 1
      objectTOi[o2] = v

    vertexpairs.append((u,v))

  G.add_edge_list(vertexpairs)


On 27 April 2015 at 16:39, Tiago de Paula Peixoto <[hidden email]> wrote:
On 27.04.2015 14:29, thekswenson wrote:
> I've been using networkx to simply create a graph and check the connected
> components.  The bottleneck of the operation is the creation of the edges.
>
> I've heard that graph-tool is very efficient so I've replaces the code with
> a graph-tool graph.
> To my surprise, the creation of a graph-tool graph is MUCH slower than that
> of a networkx graph.
>
> Am I doing something wrong?

How does the performance change if you create the necessary edges
beforehand?

In graph-tool things are faster than in networkx when they are delegated
to C++, otherwise this should be comparable in speed. In the case of
adding many edges, this is done by using the Graph.add_edge_list()
function, which runs in C++ internally. In your example, this should
provide a massive speed-up.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

Elliot

Hi offonoffon...
  won't the creation of the vertices and edges one at a time end up dramatically affecting the performance of graph creation (see my original question at the top of the thread)?


Yes, and that has been solved already.  I was responding to "It's somewhat annoying to have to keep track of the vertices that will be created like this", and to the observation that your code is maybe not as flexible/reuseable as it could be.  And as I said, my code could be easily modified to add the edges as a batch instead of one at a time.

-Elliot



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

thekswenson
I agree that hiding the mess inside a class could make the view from outside much nicer.

On 28 April 2015 at 17:45, ... <[hidden email]> wrote:

Hi offonoffon...
  won't the creation of the vertices and edges one at a time end up dramatically affecting the performance of graph creation (see my original question at the top of the thread)?


Yes, and that has been solved already.  I was responding to "It's somewhat annoying to have to keep track of the vertices that will be created like this", and to the observation that your code is maybe not as flexible/reuseable as it could be.  And as I said, my code could be easily modified to add the edges as a batch instead of one at a time.

-Elliot



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

Tiago Peixoto
Administrator
In reply to this post by thekswenson
On 27.04.2015 22:23, Krister wrote:

> Thanks for the quick response Thiago!
>
> In this code all the edges and vertices are created by graph-tool and the result is something much faster...
>    is this the best I can do?
>
> It's somewhat annoying to have to keep track of the vertices that will be created like this:
>
> def graph_tool_create_all_at_once():
>   """ Create a graph_tool graph given a list of pairs. """
>   G = Graph(directed=False)
>   objectTOi = {}
>   vertexpairs = []
>   counter = 0
>   for o1,o2 in get_pairs_of_ints():
>     if(o1 in objectTOi):
>       u = objectTOi[o1]
>     else:
>       u = counter
>       counter += 1
>       objectTOi[o1] = u
>     if(o2 in objectTOi):
>       v = objectTOi[o2]
>     else:
>       v = counter
>       counter += 1
>       objectTOi[o2] = v
>
>     vertexpairs.append((u,v))
>
>   G.add_edge_list(vertexpairs)
Yes, it is possible to improve this. If your o1 and o2 objects are
always ints (as I gather from get_pairs_of_ints()), then you don't need
these dictionaries at all. The G.add_edge_list() will create missing
nodes as necessary. So you only need to do:

   G.add_edge_list(list(get_pairs_of_ints()))

This will be much faster. However it requires your ints to be within
some reasonable range, since missing ints without edges will also be
created. But you are better off guaranteeing this is the case when you
generate the ints in the first place.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (836 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

thekswenson
Sorry Thiago...
   I accidentally left the get_pairs_of_ints where get_pairs_of_objects should be.

The code that creates all edges using integer indices is indeed very fast.


On 29 April 2015 at 12:16, Tiago de Paula Peixoto <[hidden email]> wrote:
On 27.04.2015 22:23, Krister wrote:
> Thanks for the quick response Thiago!
>
> In this code all the edges and vertices are created by graph-tool and the result is something much faster...
>    is this the best I can do?
>
> It's somewhat annoying to have to keep track of the vertices that will be created like this:
>
> def graph_tool_create_all_at_once():
>   """ Create a graph_tool graph given a list of pairs. """
>   G = Graph(directed=False)
>   objectTOi = {}
>   vertexpairs = []
>   counter = 0
>   for o1,o2 in get_pairs_of_ints():
>     if(o1 in objectTOi):
>       u = objectTOi[o1]
>     else:
>       u = counter
>       counter += 1
>       objectTOi[o1] = u
>     if(o2 in objectTOi):
>       v = objectTOi[o2]
>     else:
>       v = counter
>       counter += 1
>       objectTOi[o2] = v
>
>     vertexpairs.append((u,v))
>
>   G.add_edge_list(vertexpairs)

Yes, it is possible to improve this. If your o1 and o2 objects are
always ints (as I gather from get_pairs_of_ints()), then you don't need
these dictionaries at all. The G.add_edge_list() will create missing
nodes as necessary. So you only need to do:

   G.add_edge_list(list(get_pairs_of_ints()))

This will be much faster. However it requires your ints to be within
some reasonable range, since missing ints without edges will also be
created. But you are better off guaranteeing this is the case when you
generate the ints in the first place.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool



_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficiency of graph creation

Tiago Peixoto
Administrator
On 29.04.2015 18:12, Krister wrote:
> Sorry Thiago...
>    I accidentally left the get_pairs_of_ints where get_pairs_of_objects should be.
>
> The code that creates all edges using integer indices is indeed very fast.

Ok. Still, the best option would be not to iterate over all pairs to
construct the mapping to ints, but to iterate only through the
individual objects.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (836 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>