FAQ
Please, can anybody write me a simple recursive matrix multiplication
using multiple threads in Python, or at least show me some guidelines
how to write it myself

Thank You

Search Discussions

  • Ulrich Eckhardt at Jan 4, 2011 at 10:15 am

    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    No problem, I just need your bank account data to withdraw the payment and
    the address of your teacher whom to send the results. ;^)

    Seriously, things don't work like this. If you show effort, you will get
    help, but you don't get working solutions without even a bit of effort on
    your own.

    Start with a recursive algorithm, then make it multithreaded. Be prepared to
    make some experiments on how to distribute tasks to threads and collect
    them again.

    Uli


    cfp.vinorg at gmail.com Mon Jan 3 19:31:31 2011
    From: cfp.vinorg at gmail.com (CFP - 1st International Conference on Virtual and Networked
    Organizations Emergent Technologies and Tools)
    Date: Mon, 3 Jan 2011 18:31:31 +0000
    Subject: CFP - ViNOrg 11 - 1st International Conference on Virtual and
    Networked Organizations: Emergent Technologies and Tools
    Message-ID: <20110103185815.0FE70EE99F@mail.python.org>

    ViNOrg 11
    1st International Conference on Virtual and Networked Organizations Emergent Technologies and Tools
    ----------
    Paper submission deadline: April 17, 2011
    ----------
    http://www.2100projects.org/vinorg11
    vinorg at 2100projects.org
    ----------

    Dear Professor,

    It is our great pleasure to invite you to participate in the 1st International Conference on Virtual and Networked Organizations Emergent Technologies and Tools, ViNOrg ?11, to be held in Ofir, Portugal, from July 6-8, 2011.

    ViNOrg ?11 is promoted by the 2100 Projects Association ? Scientific Association for Promotion of Technology and Management for Organizational and Social Transformative Change.

    The overall objectives of the conference are to contribute to the development, implementation and promotion of advanced emergent IC technologies to be used in future Virtual and Networked Organizations, through the discussion and sharing of knowledge, as well as experiences and scientific and technical results.

    A shortlist of intended topics include: metaverse, virtual and augmented reality, ubiquitous computing and organizations, grid computing, cloud computing and architectures, human-computer interfaces, serious games, intelligence and soft computing, data mining, web services, cognitive systems, social networks and other emergent IT/IS approaches in various function domains, such as decision support systems, planning, design, control, negotiation, marketing, management and many other, in the context of virtual and networked enterprises and organizations.

    All accepted full papers will be published in the Conference Proceedings to be published by Springer-Verlag in a book of the CCIS series (Communications in Computer and Information Science), which is listed in the ISI proceedings index.
    Authors of selected papers will be invited to extend their papers for publication in some international scientific journals.

    For more information please consult the conference webpage at http://www.2100projects.org/vinorg11

    Looking forward to meeting you in Ofir (Portugal) next July 2011, accept our best regards.

    The conference co-chairs,
    * Goran D. Putnik (putnikgd at dps.uminho.pt), University of Minho, Portugal
    * Maria Manuela Cruz-Cunha (mcunha at ipca.pt), Polytechnic Institute of Cavado and Ave, Portugal

    ----------
    http://www.2100projects.org/vinorg11
    vinorg at 2100projects.org
    ----------
    You are receiving this email because of your research activities on the conference related topics. To unsubscribe please send an email to vinorg at 2100projects.org with the subject "Unsubscribe".
    (please excuse us if you received this call more than once).
    ----------
  • Grégory Leocadie at Jan 4, 2011 at 11:05 am

    On 4 jan, 11:15, Ulrich Eckhardt wrote:
    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    No problem, I just need your bank account data to withdraw the payment and
    the address of your teacher whom to send the results. ;^)

    Seriously, things don't work like this. If you show effort, you will get
    help, but you don't get working solutions without even a bit of effort on
    your own.

    Start with a recursive algorithm, then make it multithreaded. Be prepared to
    make some experiments on how to distribute tasks to threads and collect
    them again.

    Uli
    Hi,

    just have a look here http://en.wikipedia.org/wiki/Matrix_multiplication
    and find out what parts can be parallelized ;)

    Gregory LEOCADIE
  • Zdenko at Jan 4, 2011 at 12:22 pm

    On 4.1.2011 11:15, Ulrich Eckhardt wrote:
    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    No problem, I just need your bank account data to withdraw the payment and
    the address of your teacher whom to send the results. ;^)

    Seriously, things don't work like this. If you show effort, you will get
    help, but you don't get working solutions without even a bit of effort on
    your own.

    Start with a recursive algorithm, then make it multithreaded. Be prepared to
    make some experiments on how to distribute tasks to threads and collect
    them again.

    Uli
    Ok, I expected something like this :)
    Lets try this way:

    I wrote these two codes for example:

    this one is naive way of matrix multiplication using one thread

    from random import *
    from time import clock
    import threading

    t0 = clock()
    A=[]
    B=[]
    C=[]
    m00 #size of matrix m x m

    for i in range(m):
    A.append([])
    B.append([])
    C.append([])
    for j in range(m):
    A[i].append(randint(1,9))
    B[i].append(randint(1,9))
    C[i].append(0)

    class MyThread ( threading.Thread ):

    def run ( self ):
    for i in range(m):
    for j in range(m):
    for k in range(m):
    C[i][j]+=A[i][k]*B[k][j]

    t=MyThread()
    t.start()
    t.join()
    print clock() - t0, "seconds"



    this one is using two threads, each one is multiplying half of matrix

    import time
    from threading import Thread
    from random import randint

    t0 = time.clock()
    class MyThread(Thread):
    def __init__(self, poc, kr):
    Thread.__init__(self)
    self.poc=poc
    self.kr=kr
    def run(self):
    for i in range(self.poc,self.kr):
    for j in range(m):
    for k in range(m):
    C[i][j]+=A[i][k]*B[k][j]

    A=[]
    B=[]
    C=[]
    m00 #size of matrix m x m

    for i in range(m):
    A.append([])
    B.append([])
    C.append([])
    for j in range(m):
    A[i].append(randint(1,9))
    B[i].append(randint(1,9))
    C[i].append(0)
    thr1=MyThread(0,m/2)
    thr2=MyThread(m/2,m)
    thr1.start()
    thr2.start()
    thr1.join()
    thr2.join()
    print time.clock() - t0, "seconds"

    why is second one more than twice slower than first when it should be
    faster. I suspect that problem is in simultaneous access to matrix C but
    i don't know how to solve this.

    I used this just for example to see how the threading works so i can
    work on recursive and Strassen algorithm.
  • DevPlayer at Jan 4, 2011 at 3:42 pm
    I would be very suprised if you achieve faster results threading this
    problem. There's been much discussed on benefits or lack thereof to
    threading in Python (or in general).

    Threading is best used in situations where you are doing different
    kinds of tasks. For example if you want to do your matrix
    multiplication WHILE you were doing other things on your computer
    where matrix multiplication was a background process chugging away
    when you are not taxing the computer doing other stuff.

    Threading adds efficiency when you would have lots of "blocking"
    operations like reading in lots of big files from a comparable slow
    hard drive (compared to how fast a CPU processes data) or waiting on
    netword data (super slow compared to CPU processing).

    When you are doing mass numeric processing, you want to minimize the
    jumping from one function to another which uses overhead, recursion
    adds a small amount of uneccessary overhead, you want to minimize the
    need for the cpu to switch between threads or processes.

    If you still feel the need to use threads for some reason, for numeric
    processing I'd recommend using a "lighter" thread object, like a tiny
    thread or green thread or a threadlet or whatever they are calling
    them now.

    Another thing to note is it seems you might be expecting threads to
    run on different CPU cores expecting improvment. Be careful with this
    assumption. This is not always true. It is up to the CPU and OS to
    determine how threads are handled and perhaps the GIL to some extent.
    Beaware that Python has a GIL (some distributions). Google it if you
    don't know of it.

    To encourage better use of multi-core cpus you might consider the
    multiprocessing library included in Python 2.7 (I think) and above.

    I'm assuming that speed is an issue because you where timing your
    code. If you are doing actual serious number crunching there's lots of
    advice on this. The python Numpy package as well as Stackless Python
    (for microthreads or whatever thier called) comes to mind.

    Another thought. Ask yourself if you need a large in-memory or live
    set of processed numbers, in your case a fully and processed
    multiplied matrix. Usually a large set of in-memory numbers is
    something your going to use to simulate a model or to process and
    crunch further.

    Or is your actual usage going to be picking out a processed number
    here or there from the matrix. If this is true look at iterators or
    generators. Which would be a snapshot in time of your matrix
    multiplication. I like to think of Python generators like integral
    calculus (definition at: http://en.wikipedia.org/wiki/Integral_calculus)
    where the specific integral of a generator is often just 1.

    I'm loving generators a lot. For example there are generator
    accelorators which if you think it through means you can make
    generator deccelorators, useful for doing interpolation between
    elements of your matrix elements for example. I always forget if
    generators are thread safe though.

    Some indicators that generators could help: You're doing lots of for
    loops with range().

    Also it's been measured that list comprehensions are slightly faster
    then while loops are a slightly faster then for loops. You can Google
    to confirm, enter something like "python fast iteration".

    Also if your numbers in your matix are actually not really numbers but
    objects with numbers, __slots__ is used to for large sets of objects
    (10s of millions at the very least) to minimize memory usage and
    perhaps with speed, if used properly. Just mentioning. I'd stay away
    from this though.

    Some of my informatation may be inaccurate (and even completely wrong;
    like I always get when a thread is best switched during a blocking
    verse a non-blocking operation) but there are some things to consider.
  • DevPlayer at Jan 4, 2011 at 3:49 pm
  • DevPlayer at Jan 4, 2011 at 4:10 pm
    See section titled: "'array' or 'matrix'? Which should I use?"
    at
    http://www.scipy.org/NumPy_for_Matlab_Users

    BTW
    http://www.python.org/dev/peps/pep-0211/
  • Dan Stromberg at Jan 4, 2011 at 5:08 pm

    On Tue, Jan 4, 2011 at 4:22 AM, Zdenko wrote:
    On 4.1.2011 11:15, Ulrich Eckhardt wrote:

    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    No problem, I just need your bank account data to withdraw the payment and
    the address of your teacher whom to send the results. ;^)

    Seriously, things don't work like this. If you show effort, you will get
    help, but you don't get working solutions without even a bit of effort on
    your own.

    Start with a recursive algorithm, then make it multithreaded. Be prepared
    to
    make some experiments on how to distribute tasks to threads and collect
    them again.

    Uli
    Ok, I expected something like this :)
    Lets try this way:

    I wrote these two codes for example:

    this one is naive way of matrix multiplication using one thread

    from random import *
    from time import clock
    import threading

    t0 = clock()
    A=[]
    B=[]
    C=[]
    m00 ?#size of matrix m x m

    for i in range(m):
    ? ?A.append([])
    ? ?B.append([])
    ? ?C.append([])
    ? ?for j in range(m):
    ? ? ? ?A[i].append(randint(1,9))
    ? ? ? ?B[i].append(randint(1,9))
    ? ? ? ?C[i].append(0)

    class MyThread ( threading.Thread ):

    ? def run ( self ):
    ? ? ? ? ? for i in range(m):
    ? ? ? ? ? ? ? ?for j in range(m):
    ? ? ? ? ? ? ? ? ? ?for k in range(m):
    ? ? ? ? ? ? ? ? ? ? ? ?C[i][j]+=A[i][k]*B[k][j]

    t=MyThread()
    t.start()
    t.join()
    print clock() - t0, "seconds"



    this one is using two threads, each one is multiplying half of matrix

    import time
    from threading import Thread
    from random import randint

    t0 = time.clock()
    class MyThread(Thread):
    ? ?def __init__(self, poc, kr):
    ? ? ? ?Thread.__init__(self)
    ? ? ? ?self.poc=poc
    ? ? ? ?self.kr=kr
    ? ?def run(self):
    ? ? ? ? ? for i in range(self.poc,self.kr):
    ? ? ? ? ? ? ? ?for j in range(m):
    ? ? ? ? ? ? ? ? ? ?for k in range(m):
    ? ? ? ? ? ? ? ? ? ? ? ?C[i][j]+=A[i][k]*B[k][j]

    A=[]
    B=[]
    C=[]
    m00 #size of matrix m x m

    for i in range(m):
    ? ?A.append([])
    ? ?B.append([])
    ? ?C.append([])
    ? ?for j in range(m):
    ? ? ? ?A[i].append(randint(1,9))
    ? ? ? ?B[i].append(randint(1,9))
    ? ? ? ?C[i].append(0)
    thr1=MyThread(0,m/2)
    thr2=MyThread(m/2,m)
    thr1.start()
    thr2.start()
    thr1.join()
    thr2.join()
    print time.clock() - t0, "seconds"

    why is second one more than twice slower than first when it should be
    faster. I suspect that problem is in simultaneous access to matrix C but i
    don't know how to solve this.

    I used this just for example to see how the threading works so i can work on
    recursive and Strassen algorithm.
    --
    http://mail.python.org/mailman/listinfo/python-list
    Threads in CPython are decent when you're doing I/O, not so good when
    you're doing CPU-bound tasks. When you're doing CPU-bound tasks, you
    may actually find that your application runs about 1/n times as fast,
    where n is the number of CPU's - that is, much slower. I don't mean
    that it just doesn't scale well, I mean that each additional CPU makes
    things slower for CPU-bound jobs.

    If you require java-like threading, you might be better off with
    Jython or Iron Python for now.

    pypy's likely to have good threading eventually, but last I heard it
    wasn't quite there yet.

    If you do require CPython, you might check out the greenlets module
    and/or PyCSP.

    Oh, and there's "stackless python", which amounts to a branch of
    CPython that's unlikely to be merged. It's supposed to be able to
    thread really well, and ISTR hearing that it's pretty similar to
    greenlets. The main force behind stackless is working on pypy now, I
    believe.

    Finally, with CPython, you can use the multiprocessing module to get
    pretty good parallelism via distinct processes. This way you tend to
    end up using queues for interprocess communication; these queues use
    shared memory. However, you might actually be able to put your matrix
    in shared memory - if it's large enough to justify that. I know you
    could put a linear array of homogeneous, scalar type in shared memory;
    I've not heard of a way of putting an aggregate type in shared memory.
    Naturally, you can fake an n dimensional array using a 1 dimensional
    array with a little multiplication and addition.

    As an example of using CPython with multiprocessing without having
    your matrix in shared memory, you could probably have one worker
    subprocess for each core on a system, divide up the cells of your
    result matrix by their coordinates (perhaps using an iterator and izip
    across the queues) and send a message (via a shared memory queue) with
    those coordinates to the appropriate subprocess. Then those
    subprocess send back the coordinates and the result at said coordinate
    via a different result queue. I suspect you'd want one queue for all
    the results, and one queue for each subprocess for initiating
    computation. Each subprocess would have its own COW copy of the input
    matrix.
  • Steven D'Aprano at Jan 4, 2011 at 10:48 pm

    On Tue, 04 Jan 2011 13:22:33 +0100, Zdenko wrote:

    I wrote these two codes for example:

    this one is naive way of matrix multiplication using one thread [...]
    this one is using two threads, each one is multiplying half of matrix [...]
    why is second one more than twice slower than first when it should be
    faster. I suspect that problem is in simultaneous access to matrix C but
    i don't know how to solve this.
    Can I suggest that your timing code leaves something to be desired?

    * You time too much. You're interested in how long it takes to multiply
    two matrices, but you time how long it takes to do much more than just
    the multiplication: your timing code covers the time it takes to create
    the class object (which will be trivial), and build the matrix (non-
    trivial), as well as perform the multiplication.

    * Your perform the timing test once, which makes it subject to sampling
    errors. (Although if the process takes a long time, say, multiple
    seconds, the sampling error will *probably* be small relative to the
    measured time. But not always.)

    * You use time.clock, but without knowing which operating system you are
    running, it's impossible to tell whether you should be using time.time
    instead.

    Whether these issues will actually make a practical difference in this
    *specific* case, I don't know, but as a general rule, the most accurate
    way to perform these sorts of timing tests is with the timeit module.
    Something like this:


    A = ... # initialise matrix A
    B = ... # and B
    C = ... # and the result

    def mult1(A, B, C):
    # Multiply matrices A and B using 1 thread.
    t = MyThread()
    t.start()
    t.join()

    def mult2(A, B, C):
    # Multiply matrices A and B using 2 threads.
    t1 = MyThread()
    t2 = MyThread()
    t1.start()
    t2.start()
    t1.join() # Block until thread 1 is done.
    t2.join() # Now block until thread 2 is done.

    setup1 = "from __main__ import A, B, C, mult1"
    setup2 = "from __main__ import A, B, C, mult2"

    from timeit import Timer
    t1 = Timer("mult1(A, B, C)", setup1)
    t2 = Timer("mult2(A, B, C)", setup2)


    # Now perform the timing tests.
    best1 = min(t1.repeat())
    best2 = min(t2.repeat())

    By default, Timer.repeat will measure the time taken by one million
    iterations of the code snippet, and take three measurements. You almost
    always only care about the smallest measurement -- any larger times will
    represent sampling error.

    If it takes too long to time one million iterations, either make the
    matrices smaller, or pass keyword arguments number and/or repeat to the
    repeat method:

    # best of five measurements of one hundred iterations
    best1 = min(t1.repeat(number0, repeat=5))



    --
    Steven
  • John Nagle at Jan 5, 2011 at 3:46 am

    On 1/4/2011 2:15 AM, Ulrich Eckhardt wrote:
    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    No problem, I just need your bank account data to withdraw the payment and
    the address of your teacher whom to send the results. ;^)

    Seriously, things don't work like this. If you show effort, you will get
    help, but you don't get working solutions without even a bit of effort on
    your own.

    Start with a recursive algorithm, then make it multithreaded. Be prepared to
    make some experiments on how to distribute tasks to threads and collect
    them again.
    If you want to do matrix multiplication from Python, use NumPy.
    Anything you write in Python itself and run in CPython will be far,
    far slower. CPython without NumPy is about 60x slower than C on
    basic array math.

    Writing high-speed parallel number-crunching code that outperforms
    off the shelf libraries is typically non-trivial. If there's some
    easy way to make a simple operation like matrix multiply go faster,
    it's probably been done already.

    If you're really into this, look into the Unified Parallel C effort
    and its libraries. The supercomputer types spend their lives doing this
    sort of thing.

    John Nagle
  • Tim Roberts at Jan 5, 2011 at 7:31 am

    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    Matrix multiplication is not generally done recursively. There's no
    conceptual gain. It makes more sense iteratively.
    --
    Tim Roberts, timr at probo.com
    Providenza & Boekelheide, Inc.
  • Dan Stromberg at Jan 5, 2011 at 8:02 pm

    On Tue, Jan 4, 2011 at 11:31 PM, Tim Roberts wrote:
    Zdenko wrote:
    Please, can anybody write me a simple recursive matrix multiplication
    using multiple threads in Python, or at least show me some guidelines
    how to write it myself
    Matrix multiplication is not generally done recursively. ?There's no
    conceptual gain. ?It makes more sense iteratively.
    It may not be that common, but there is at least one significant
    advantage: Cache obliviousness.

    http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 4, '11 at 9:40a
activeJan 5, '11 at 8:02p
posts12
users8
websitepython.org

People

Translate

site design / logo © 2022 Grokbase