FAQ
Hi ,

I am using Fedora Core -3 and my Python version is 2.4 .
kernel version - 2.6.9-1.667smp
There is some freakish thing that happens with python hashes when i
run a python script
my python file is basically :

myhash = {}
def summa():
global myhash
myhash[0] = 0
myhash[1] = 1
myhash[2] = 2
myhash[3] = 3

i run a C file :
main(int argc, char **argv)
{
int i = atoi(argv[1]), j;
printf("myhash = {}\n");
printf("def summa():\n");
printf(" global myhash\n");

for (j = 0; j < i; j++)
printf(" myhash[%d] = %d\n", j, j);

printf("\nsumma()\n");
}

and the output of this .c file is redirected to a .py file.
I do the following Steps to the .c file to create the .py file

1 cc -o s s.c
2 ./s (input) >>test.py
3 python test.py
When i run python to this .py file , i find that this process eats
lots of virtual memory of my machine. I can give u detailed examples
to what heights can the virtual memory can go , when i do a top ,
with the inputs given to the c file

1. input is 100000 VIRT is 119m
2. input is 300000 VIRT is 470m
3 input is 700000 VIRT is 1098m
4 input is 1000000 VIRT is 1598m

where VIRT - virtual memory

m - MB ( Mega Bytes)

these results are very alarming as it means that each
hash[i] requires 1 KB of space approx .

I would like to know why and how to solve this problem ?

I also did try change the above .c file , so that the new .c file
will have multiple functions that divide the load of building the hash
structure. Then again the results are same

Pls do assist me with this problem of mine !!!

Search Discussions

  • Steven D'Aprano at Jan 19, 2006 at 8:41 am

    pycraze wrote:

    I do the following Steps to the .c file to create the .py file

    1 cc -o s s.c
    2 ./s (input) >>test.py
    3 python test.py
    You are appending to the test file. How many times have
    you appended to it? Once? Twice? A dozen times? Just
    what is in the file test.py after all this time?

    When i run python to this .py file , i find that this process eats
    lots of virtual memory of my machine. I can give
    detailed examples
    to what heights can the virtual memory can go ,
    when i do a top ,
    with the inputs given to the c file >
    1. input is 100000 VIRT is 119m
    The dictionary you create is going to be quite small:
    at least 780KB. Call it a megabyte. (What's a dozen or
    two K between friends?) Heck, call it 2MB.

    That still leaves a mysterious 117MB unaccounted for.
    Some of that will be the Python virtual machine and
    various other overhead. What else is there?

    Simple: you created a function summa with 100000 lines
    of code. That's a LOT of code to go into one object.
    Normally, 100,000 lines of code will be split between
    dozens, hundreds of functions and multiple modules. But
    you've created one giant lump of code that needs to be
    paged in and out of memory in one piece. Ouch!

    def summa():
    ... global hash
    ... hash[0] = 0
    ... hash[1] = 1
    ...
    import dis # get the byte-code disassembler
    dis.dis(summa) # and disassemble the function
    3 0 LOAD_CONST 1 (0)
    3 LOAD_GLOBAL 0 (hash)
    6 LOAD_CONST 1 (0)
    9 STORE_SUBSCR

    4 10 LOAD_CONST 2 (1)
    13 LOAD_GLOBAL 0 (hash)
    16 LOAD_CONST 2 (1)
    19 STORE_SUBSCR
    20 LOAD_CONST 0 (None)
    23 RETURN_VALUE

    That's how much bytecode you get for two keys. Now
    imagine how much you'll need for 100,000 keys.

    You don't need to write the code from C, just do it all
    in Python:

    hash = {}
    def summa(i):
    global hash
    for j in range(i):
    hash[j] = j

    import sys
    summa(sys.argv[1])


    Now run the script:


    python test.py 100000



    --
    Steven.
  • Pycraze at Jan 19, 2006 at 3:01 pm
    You are appending to the test file. How many times have
    you appended to it? Once? Twice? A dozen times? Just
    what is in the file test.py after all this time?
    when input =4
    ./s 4 >test.py

    test.py is
    myhash = {}
    def summa():
    global myhash
    myhash[0] = 0
    myhash[1] = 1
    myhash[2] = 2
    myhash[3] = 3

    if now input is 100 then test py will be

    myhash = {}
    def summa():
    global myhash
    myhash[0] = 0
    myhash[1] = 1
    myhash[2] = 2
    myhash[3] = 3
    .......
    .......
    .......
    .......
    myhash[99] = 99
    I append only once , and that too i do this exercise to get a largely big hash.

    This result came to a bit of a suprise.. when i construct large hashes
    .. my system gets stalled....

    So i was interested how, for this i came up with this exercise .

    Anyway thanks
    Dennis
  • Steve Holden at Jan 19, 2006 at 3:20 pm

    pycraze wrote:
    You are appending to the test file. How many times have
    you appended to it? Once? Twice? A dozen times? Just
    what is in the file test.py after all this time?

    when input =4
    ./s 4 >test.py
    test.py is
    myhash = {}
    def summa():
    global myhash
    myhash[0] = 0
    myhash[1] = 1
    myhash[2] = 2
    myhash[3] = 3

    if now input is 100 then test py will be

    myhash = {}
    def summa():
    global myhash
    myhash[0] = 0
    myhash[1] = 1
    myhash[2] = 2
    myhash[3] = 3
    .......
    .......
    .......
    .......
    myhash[99] = 99
    I append only once , and that too i do this exercise to get a largely big hash.


    This result came to a bit of a suprise.. when i construct large hashes
    .. my system gets stalled....

    So i was interested how, for this i came up with this exercise .
    OK. Now examine a similar program:

    myhash = {}
    def summa(n):
    global myhash
    for i in range(n):
    myhash[i] = i

    summan(1000000)

    and see how this compares with your program having a million lines of
    source. Then repeat, changing the argument to summa. Then draw some
    conclusions. Then report those results back.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
  • Pycraze at Jan 20, 2006 at 2:59 am
    Surely adopting the above method is much better than what i have
    approached earlier . The main reason i did adopt this exercise was when
    i have to marshal a 20 - 40 MB above test.py file to the disk , the
    simple load of the test.py will sky rocket my virtual memory
    consumption.

    I was bit troubled with this bottleneck , so i wanted to do some
    research before i come to some conclusions .

    Anyways , i really appreciate ur enthusiasm to helping me come to a
    conclusion

    Dennis
  • Steve Holden at Jan 20, 2006 at 8:27 am

    pycraze wrote:
    Surely adopting the above method is much better than what i have
    approached earlier . The main reason i did adopt this exercise was when
    i have to marshal a 20 - 40 MB above test.py file to the disk , the
    simple load of the test.py will sky rocket my virtual memory
    consumption.

    I was bit troubled with this bottleneck , so i wanted to do some
    research before i come to some conclusions .

    Anyways , i really appreciate ur enthusiasm to helping me come to a
    conclusion

    Dennis
    I'd be interested to know what other languages you have tested with
    20-40MB source files, and what conclusions you have arrived at about
    them. Particularly since your initial conclusion about Python was
    "dictionaries are weird and each entry uses approximately 1kB" :-)

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC www.holdenweb.com
    PyCon TX 2006 www.python.org/pycon/
  • Steven D'Aprano at Jan 21, 2006 at 3:17 am

    On Thu, 19 Jan 2006 18:59:11 -0800, pycraze wrote:

    Surely adopting the above method is much better than what i have
    approached earlier . The main reason i did adopt this exercise was when
    i have to marshal a 20 - 40 MB above test.py file to the disk , the
    simple load of the test.py will sky rocket my virtual memory
    consumption.
    Why do you have to marshal a 20MB test.py file? That's just crazy.

    The entire standard Python language, object files and source files
    combined, is about 90MB. There are 185 *.py modules just in the top level
    of the directory, that's less than half a megabyte per module (and in
    reality, much less than that).

    If your source file is more than 100K in size, you really should be
    breaking it into separate modules. If any one function is more than one
    page when printed out, you probably should be splitting it into two or
    more functions.




    --
    Steven.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 19, '06 at 5:46a
activeJan 21, '06 at 3:17a
posts7
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase