FAQ
In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ
by one letter. The two methods I wrote seem to me to
likely perform the same computations, the list comprehension
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
count = 0
for idx in range(0,5):
if w1[idx] != w2[idx]:
count += 1
return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a < b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data[i],data[j]) == 1 and i < j:
graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

Search Discussions

  • MRAB at Sep 2, 2011 at 1:12 am

    On 02/09/2011 01:35, Bart Kastermans wrote:
    In the following code I create the graph with vertices
    sgb-words.txt (the file of 5 letter words from the
    stanford graphbase), and an edge if two words differ
    by one letter. The two methods I wrote seem to me to
    likely perform the same computations, the list comprehension
    is faster though (281 seconds VS 305 seconds on my dell mini).

    Is the right interpretation of this timing difference
    that the comprehension is performed in the lower level
    C code?

    As this time I have no other conjecture about the cause.

    ---------------------------------------------------------
    import time
    import copy

    data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

    def d (w1, w2):
    count = 0
    for idx in range(0,5):
    if w1[idx] != w2[idx]:
    count += 1
    return count

    print "creating graph"
    t0 = time.clock ()
    graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
    t1 = time.clock ()
    print "took " + str (t1 - t0) + " seconds."

    t0 = time.clock ()
    graph2 = []
    for i in range (0, len(data)):
    for j in range(0,len(data)):
    if d(data[i],data[j]) == 1 and i< j:
    graph2.append ([i,j])
    t1 = time.clock ()
    print "took " + str (t1 - t0) + " seconds."
    Are they actually equivalent? Does graph == graph2?

    The first version (list comprehension) creates a list of pairs of
    values:

    [a, b]

    whereas the second version (for loops) creates a list of pairs of
    indexes:

    [i, j]

    The second version has subscripting ("data[i]" and "data[j]"), which
    will slow it down.
  • Bart Kastermans at Sep 2, 2011 at 1:54 pm

    MRAB <python at mrabarnett.plus.com> writes:
    On 02/09/2011 01:35, Bart Kastermans wrote:

    graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
    graph2 = []
    for i in range (0, len(data)):
    for j in range(0,len(data)):
    if d(data[i],data[j]) == 1 and i< j:
    graph2.append ([i,j])
    Are they actually equivalent? Does graph == graph2?

    The first version (list comprehension) creates a list of pairs of
    values:

    [a, b]

    whereas the second version (for loops) creates a list of pairs of
    indexes:

    [i, j]

    The second version has subscripting ("data[i]" and "data[j]"), which
    will slow it down.
    You are absolutely right. I had changed the code from the
    equivalent:

    graph2 = []
    for i in range (0, len(data)):
    for j in range(0,len(data)):
    if d(data[i],data[j]) == 1 and i < j:
    graph2.append ([data[i],data[j]])

    But then also tried the equivalent

    for a in data:
    for b in data:
    if d(a,b) == 1 and a < b:
    graph2.append([a,b])

    Which does away with the indexing, and is just about exactly as
    fast as the list comprehension.


    That'll teach me; I almost didn't ask the question thinking it might
    be silly. And it was, but I thought it for the wrong reason. I tell my
    students there are no stupid questions, I should listen to myself
    more when I do. Thanks!
  • Ting at Sep 2, 2011 at 2:50 pm

    On Sep 2, 9:54?am, Bart Kastermans wrote:
    if d(a,b) == 1 and a < b:
    It will probably be faster if you reverse the evaluation order of that
    expression.

    if a<b and d(a,b)==1:

    That way the d() function is called less than half the time. Of course
    this assumes that a<b is a faster evaluation than d(a,b), but I think
    that's true for your example.
    --
    // T.Hsu
  • Bart Kastermans at Sep 3, 2011 at 2:15 am

    ting at thsu.org writes:
    On Sep 2, 9:54?am, Bart Kastermans wrote:
    if d(a,b) == 1 and a < b:
    It will probably be faster if you reverse the evaluation order of that
    expression.

    if a<b and d(a,b)==1:

    That way the d() function is called less than half the time. Of course
    this assumes that a<b is a faster evaluation than d(a,b), but I think
    that's true for your example.
    --
    // T.Hsu
    Indeed makes quite a difference, goes from 275 seconds down to
    153 seconds.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 2, '11 at 12:35a
activeSep 3, '11 at 2:15a
posts5
users3
websitepython.org

3 users in discussion

Bart Kastermans: 3 posts MRAB: 1 post Ting: 1 post

People

Translate

site design / logo © 2022 Grokbase