Speeding up algorithms with Cython, etc.?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Speeding up algorithms with Cython, etc.?

Gareth Simons
I have a function that I’m running on subgraphs. It is effectively a bread-first outwards search for all paths from the originating vertex. It assigns probabilities to each of the end-points, which are then used for computing a type of index.

In terms of speed, I’ve wondered about three things:
1 - Whether it would be faster to adapt one of the existing graph-tool search algorithms and find a way to assign probabilities from the results…
2 - Whether there are ways to more speedily implement this algorithm through rejigging the algorithm;
3 - And, whether implementing through Cython may provide speedups.

Regarding #1, I’ve looked at using the built-in breadth-first search, but I haven’t yet figured out how to use the visitor object to create the list of probabilities. Though will try to take a closer look at this sometime soon.

Regarding #3, I’ve looked at a Cython implementation, but it doesn’t appear to provide any significant speedups. I suspect this is because the function uses graph and vertex objects and I’m assuming that Cython doesn’t have the capacity to efficiently implement these.

Any pointers would be appreciated. Thanks.

Here is the function: (Currently a .pyx file for Cython)

from scipy.stats import entropy

def nodeOutProb(g, v):
    # Calculates probabilities of all possible routes from node within cutoff distance
    visitedNodes = set()
    claimedEdges = set()
    branches = {}
    probabilities = []

    # add origin
    cdef float o = 1.0
    branches[v] = o # add starting node with starting probability
    visitedNodes.add(v) # add starting node to visited set

    cdef float r, cV, newVal

    # start iterating through branches...adding and removing as you go
    while len(branches) > 0:
        for key, val in branches.items():
            newVal = 0.0
            newEdges = []
            for e in key.all_edges():
                if e not in claimedEdges:
                    newVal += 1
                    claimedEdges.add(e)
                    newEdges.append(e)
            if newVal == 0:
                probabilities.append(1/val)
            else:
                cV = newVal * val
                for n in key.all_neighbours():
                    if n not in visitedNodes:
                        visitedNodes.add(n)
                        branches[n] = cV
                    elif g.edge(key, n) in newEdges:
                        probabilities.append(1/cV)
            del branches[key]
    return entropy(probabilities, base=2)

_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up algorithms with Cython, etc.?

Elliot

Is edge object creation the cause of the slowness? When you call vertex.all_edges, aren't new edge objects created?  And if so, are you creating multiple edge objects for each edge over the course of the loop?

Did you profile your code before moving to cython?  optimization before profiling is usually premature.

-elliot

On Jul 27, 2014 3:08 PM, "Gareth Simons" <[hidden email]> wrote:
I have a function that I’m running on subgraphs. It is effectively a bread-first outwards search for all paths from the originating vertex. It assigns probabilities to each of the end-points, which are then used for computing a type of index.

In terms of speed, I’ve wondered about three things:
1 - Whether it would be faster to adapt one of the existing graph-tool search algorithms and find a way to assign probabilities from the results…
2 - Whether there are ways to more speedily implement this algorithm through rejigging the algorithm;
3 - And, whether implementing through Cython may provide speedups.

Regarding #1, I’ve looked at using the built-in breadth-first search, but I haven’t yet figured out how to use the visitor object to create the list of probabilities. Though will try to take a closer look at this sometime soon.

Regarding #3, I’ve looked at a Cython implementation, but it doesn’t appear to provide any significant speedups. I suspect this is because the function uses graph and vertex objects and I’m assuming that Cython doesn’t have the capacity to efficiently implement these.

Any pointers would be appreciated. Thanks.

Here is the function: (Currently a .pyx file for Cython)

from scipy.stats import entropy

def nodeOutProb(g, v):
    # Calculates probabilities of all possible routes from node within cutoff distance
    visitedNodes = set()
    claimedEdges = set()
    branches = {}
    probabilities = []

    # add origin
    cdef float o = 1.0
    branches[v] = o # add starting node with starting probability
    visitedNodes.add(v) # add starting node to visited set

    cdef float r, cV, newVal

    # start iterating through branches...adding and removing as you go
    while len(branches) > 0:
        for key, val in branches.items():
            newVal = 0.0
            newEdges = []
            for e in key.all_edges():
                if e not in claimedEdges:
                    newVal += 1
                    claimedEdges.add(e)
                    newEdges.append(e)
            if newVal == 0:
                probabilities.append(1/val)
            else:
                cV = newVal * val
                for n in key.all_neighbours():
                    if n not in visitedNodes:
                        visitedNodes.add(n)
                        branches[n] = cV
                    elif g.edge(key, n) in newEdges:
                        probabilities.append(1/cV)
            del branches[key]
    return entropy(probabilities, base=2)

_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up algorithms with Cython, etc.?

Elliot Hallmark

Also, you have python objects in your inner loop, so I don't think cython will do anything for this code.  You declare a couple of variables, but the most important stuff you left dynamically typed.

Elliot

On Jul 27, 2014 5:22 PM, "..." <[hidden email]> wrote:

Is edge object creation the cause of the slowness? When you call vertex.all_edges, aren't new edge objects created?  And if so, are you creating multiple edge objects for each edge over the course of the loop?

Did you profile your code before moving to cython?  optimization before profiling is usually premature.

-elliot

On Jul 27, 2014 3:08 PM, "Gareth Simons" <[hidden email]> wrote:
I have a function that I’m running on subgraphs. It is effectively a bread-first outwards search for all paths from the originating vertex. It assigns probabilities to each of the end-points, which are then used for computing a type of index.

In terms of speed, I’ve wondered about three things:
1 - Whether it would be faster to adapt one of the existing graph-tool search algorithms and find a way to assign probabilities from the results…
2 - Whether there are ways to more speedily implement this algorithm through rejigging the algorithm;
3 - And, whether implementing through Cython may provide speedups.

Regarding #1, I’ve looked at using the built-in breadth-first search, but I haven’t yet figured out how to use the visitor object to create the list of probabilities. Though will try to take a closer look at this sometime soon.

Regarding #3, I’ve looked at a Cython implementation, but it doesn’t appear to provide any significant speedups. I suspect this is because the function uses graph and vertex objects and I’m assuming that Cython doesn’t have the capacity to efficiently implement these.

Any pointers would be appreciated. Thanks.

Here is the function: (Currently a .pyx file for Cython)

from scipy.stats import entropy

def nodeOutProb(g, v):
    # Calculates probabilities of all possible routes from node within cutoff distance
    visitedNodes = set()
    claimedEdges = set()
    branches = {}
    probabilities = []

    # add origin
    cdef float o = 1.0
    branches[v] = o # add starting node with starting probability
    visitedNodes.add(v) # add starting node to visited set

    cdef float r, cV, newVal

    # start iterating through branches...adding and removing as you go
    while len(branches) > 0:
        for key, val in branches.items():
            newVal = 0.0
            newEdges = []
            for e in key.all_edges():
                if e not in claimedEdges:
                    newVal += 1
                    claimedEdges.add(e)
                    newEdges.append(e)
            if newVal == 0:
                probabilities.append(1/val)
            else:
                cV = newVal * val
                for n in key.all_neighbours():
                    if n not in visitedNodes:
                        visitedNodes.add(n)
                        branches[n] = cV
                    elif g.edge(key, n) in newEdges:
                        probabilities.append(1/cV)
            del branches[key]
    return entropy(probabilities, base=2)

_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool


_______________________________________________
graph-tool mailing list
[hidden email]
http://lists.skewed.de/mailman/listinfo/graph-tool