FAQ
I have a long string (several Mbytes). I want to iterate over it in manageable chunks (say, 1 kbyte each). For (a small) example, if I started with "this is a very long string", and I wanted 10 character chunks, I should get:


"this is a "
"very long "
"string"


This seems like something itertools would do, but I don't see anything. Is there something, or do I just need to loop and slice (and worry about getting all the edge conditions right) myself?


---
Roy Smith
roy at panix.com

Search Discussions

  • Mark Lawrence at Nov 8, 2013 at 5:52 pm

    On 08/11/2013 17:48, Roy Smith wrote:
    I have a long string (several Mbytes). I want to iterate over it in manageable chunks (say, 1 kbyte each). For (a small) example, if I started with "this is a very long string", and I wanted 10 character chunks, I should get:

    "this is a"
    "very long"
    "string"

    This seems like something itertools would do, but I don't see anything. Is there something, or do I just need to loop and slice (and worry about getting all the edge conditions right) myself?

    ---
    Roy Smith
    roy at panix.com

    Any good to you http://pythonhosted.org/more-itertools/api.html ?


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Nick Cash at Nov 8, 2013 at 5:59 pm
    I have a long string (several Mbytes). I want to iterate over it in manageable chunks

    This is a weirdly common question. See http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python for several solutions.


    It's been proposed to be added to itertools before, but rejected: https://mail.python.org/pipermail/python-ideas/2012-July/015671.html and http://bugs.python.org/issue13095


    - Nick Cash
  • Roy Smith at Nov 8, 2013 at 6:24 pm
    Oh my, it turns out I don't really need to do this after all, due to previously undiscovered uber-coolness in the tools I'm using!


    My use case is that from inside of a Django view, I needed to retrieve a large file via a HTTP GET, and serve that back up, with some time delays inserted into the data stream. Turns out, requests (uber-cool tool #1) provides a way to iterate over the content of a GET, and Django (uber-cool tool #2) provides a way to build a HttpResponse from the data in an iterator. Epiphany! I ended up with (essentially) this:


    def stream_slow(request, song_id):
         """Streams a song, but does it extra slowly, for client testing
         purposes.


         """
         def _slow_stream(r, chunk_size):
             for chunk in r.iter_content(chunk_size):
                 yield chunk
                 time.sleep(0.1)


         url = get_url(song_id)
         response = requests.get(url, stream=True)
         return HttpResponse(_slow_stream(response, 1024))








    On Nov 8, 2013, at 12:59 PM, Nick Cash wrote:

    I have a long string (several Mbytes). I want to iterate over it in manageable chunks
    This is a weirdly common question. See http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python for several solutions.

    It's been proposed to be added to itertools before, but rejected: https://mail.python.org/pipermail/python-ideas/2012-July/015671.html and http://bugs.python.org/issue13095

    - Nick Cash



    ---
    Roy Smith
    roy at panix.com
  • Peter Otten at Nov 8, 2013 at 6:02 pm

    Roy Smith wrote:


    I have a long string (several Mbytes). I want to iterate over it in
    manageable chunks (say, 1 kbyte each). For (a small) example, if I
    started with "this is a very long string", and I wanted 10 character
    chunks, I should get:

    "this is a "
    "very long "
    "string"

    This seems like something itertools would do, but I don't see anything.
    Is there something, or do I just need to loop and slice (and worry about
    getting all the edge conditions right) myself?

    (x)range() can take care of the edges:

    s = "this is a very long string"
    def chunks(s, size):
    ... for start in xrange(0, len(s), size):
    ... yield s[start:start+size]
    ...
    list(chunks(s, 10))
    ['this is a ', 'very long ', 'string']
    list(chunks(s, 5))
    ['this ', 'is a ', 'very ', 'long ', 'strin', 'g']
    list(chunks(s, 100))
    ['this is a very long string']


    Or you use StringIO:

    from functools import partial
    from StringIO import StringIO
    list(iter(partial(StringIO(s).read, 5), ""))
    ['this ', 'is a ', 'very ', 'long ', 'strin', 'g']


    And no, this need not be a one-liner ;)
  • Skip Montanaro at Nov 8, 2013 at 6:04 pm
    I have a long string (several Mbytes). I want to iterate over it in manageable chunks (say, 1 kbyte each).

    You don't mention if the string is in memory or on disk. If it's in memory:

    for i in range(0, len(s), 10):
    ... print repr(s[i:i+10])
    ...
    'this is a '
    'very long '
    'string'


    If your string is on disk, just loop over an open file object, reading
    your chunk size every pass of the loop.


    Skip
  • Zero Piraeus at Nov 8, 2013 at 6:09 pm
    :

    On Fri, Nov 08, 2013 at 12:48:12PM -0500, Roy Smith wrote:
    I have a long string (several Mbytes). I want to iterate over it in
    manageable chunks (say, 1 kbyte each).

    "this is a "
    "very long "
    "string"

    This seems like something itertools would do, but I don't see anything.

    You could use io.StringIO (or StringIO.StringIO in Python 2.x):


         from io import StringIO
         big_str = 'x' * 10000000
         stream = StringIO(big_str)
         while True:
             chunk = stream.read(1024)
             if not chunk:
                 break
             # process chunk


      -[]z.


    --
    Zero Piraeus: ad referendum
    http://etiol.net/pubkey.asc
  • Wxjmfauth at Nov 8, 2013 at 8:43 pm
    "(say, 1 kbyte each)": one "kilo" of characters or bytes?


    Glad to read some users are still living in an ascii world,
    at the "Unicode time" where an encoded code point size may vary
    between 1-4 bytes.




    Oops, sorry, I'm wrong, it can be much more.

    sys.getsizeof('ab')
    27
    sys.getsizeof('a\U0001d11e')
    48
    >>>


    jmf
  • Chris Angelico at Nov 8, 2013 at 8:53 pm

    On Sat, Nov 9, 2013 at 7:43 AM, wrote:
    Oops, sorry, I'm wrong, it can be much more.
    sys.getsizeof('ab')
    27
    sys.getsizeof('a\U0001d11e')
    48

    I know, overhead sucks doesn't it. Python is really abysmal at that;
    look how big a single bit is:

    sys.getsizeof(1)
    14
    sys.getsizeof(True)
    14


    On the flip side, Python gets really awesome at some other things.
    Your operating system probably takes an entire CD to distribute, maybe
    even a DVD, so that's either 700MB or 4.7GB, give or take. Look how
    efficiently Python can represent it:

    sys.getsizeof(os)
    36


    Wow!


    ChrisA
  • Tim Chase at Nov 8, 2013 at 9:04 pm

    On 2013-11-09 07:53, Chris Angelico wrote:
    On the flip side, Python gets really awesome at some other things.
    Your operating system probably takes an entire CD to distribute,
    maybe even a DVD, so that's either 700MB or 4.7GB, give or take.
    Look how efficiently Python can represent it:
    sys.getsizeof(os)
    36

    Someone has been hanging out too much over on that thread about
    compressing random data ;-)


    -tkc
  • Chris Angelico at Nov 8, 2013 at 9:04 pm

    On Sat, Nov 9, 2013 at 8:04 AM, Tim Chase wrote:
    On 2013-11-09 07:53, Chris Angelico wrote:
    On the flip side, Python gets really awesome at some other things.
    Your operating system probably takes an entire CD to distribute,
    maybe even a DVD, so that's either 700MB or 4.7GB, give or take.
    Look how efficiently Python can represent it:
    sys.getsizeof(os)
    36
    Someone has been hanging out too much over on that thread about
    compressing random data ;-)

    Hey, that's a bit unfair! Operating systems aren't full of random data!


    At least... well, I can't speak for Windows here...


    *dives for cover*


    ChrisA
  • Mark Lawrence at Nov 8, 2013 at 9:06 pm

    On 08/11/2013 20:53, Chris Angelico wrote:
    On Sat, Nov 9, 2013 at 7:43 AM, wrote:
    Oops, sorry, I'm wrong, it can be much more.
    sys.getsizeof('ab')
    27
    sys.getsizeof('a\U0001d11e')
    48
    I know, overhead sucks doesn't it. Python is really abysmal at that;
    look how big a single bit is:
    sys.getsizeof(1)
    14
    sys.getsizeof(True)
    14

    On the flip side, Python gets really awesome at some other things.
    Your operating system probably takes an entire CD to distribute, maybe
    even a DVD, so that's either 700MB or 4.7GB, give or take. Look how
    efficiently Python can represent it:
    sys.getsizeof(os)
    36

    Wow!

    ChrisA

    Those figures look really good but I actually want figures that do
    things my way, even if the figures aren't as good or even suck
    completely. Can you help me with this even if I've already asked 42
    times before but have always been given the same figures in response?


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Chris Angelico at Nov 8, 2013 at 9:17 pm

    On Sat, Nov 9, 2013 at 8:06 AM, Mark Lawrence wrote:
    Those figures look really good but I actually want figures that do things my
    way, even if the figures aren't as good or even suck completely. Can you
    help me with this even if I've already asked 42 times before but have always
    been given the same figures in response?

    Yep! I can even offer you an hourglass figure. Here, watch this figure
    of an hourglass while you wait for a different answer...


    ChrisA
  • Mark Lawrence at Nov 8, 2013 at 8:57 pm

    On 08/11/2013 20:43, wxjmfauth at gmail.com wrote:
    "(say, 1 kbyte each)": one "kilo" of characters or bytes?

    Glad to read some users are still living in an ascii world,
    at the "Unicode time" where an encoded code point size may vary
    between 1-4 bytes.


    Oops, sorry, I'm wrong, it can be much more.
    sys.getsizeof('ab')
    27
    sys.getsizeof('a\U0001d11e')
    48
    jmf

    For any newcomers please ignore the rubbish that "Joseph McCarthy" Faust
    comes up with from time to time. He's been asked repeatedly to come up
    with evidence to support his claims regarding PEP 393, the Flexible
    String Representation, but he never does, clearly because he can't.
    Instead he provides micro benchmarks or meaningless numbers like those
    above.


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Steven D'Aprano at Nov 9, 2013 at 12:46 am

    On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote:


    "(say, 1 kbyte each)": one "kilo" of characters or bytes?

    Glad to read some users are still living in an ascii world, at the
    "Unicode time" where an encoded code point size may vary between 1-4
    bytes.


    Oops, sorry, I'm wrong,

    That part is true.



    it can be much more.

    That part is false. You're measuring the overhead of the object
    structure, not the per-character storage. This has been the case going
    back since at least Python 2.2: strings are objects, and have overhead.

    sys.getsizeof('ab')
    27

    27 bytes for two characters! Except it isn't, it's actually 25 bytes for
    the object header and two bytes for the two characters.

    sys.getsizeof('a\U0001d11e')
    48

    And here you have four bytes each for the two characters and a 40 byte
    header. Observe:


    py> c = '\U0001d11e'
    py> len(c)
    1
    py> sys.getsizeof(2*c) - sys.getsizeof(c)
    4
    py> sys.getsizeof(1000*c) - sys.getsizeof(999*c)
    4




    How big is the object overhead on a (say) thousand character string? Just
    one percent:


    py> (sys.getsizeof(1000*c) - 4000)/4000
    0.01






    --
    Steven
  • Wxjmfauth at Nov 9, 2013 at 8:14 am

    Le samedi 9 novembre 2013 01:46:32 UTC+1, Steven D'Aprano a ?crit?:
    On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote:


    "(say, 1 kbyte each)": one "kilo" of characters or bytes?

    Glad to read some users are still living in an ascii world, at the
    "Unicode time" where an encoded code point size may vary between 1-4
    bytes.


    Oops, sorry, I'm wrong,


    That part is true.




    it can be much more.


    That part is false. You're measuring the overhead of the object

    structure, not the per-character storage. This has been the case going

    back since at least Python 2.2: strings are objects, and have overhead.


    sys.getsizeof('ab')
    27


    27 bytes for two characters! Except it isn't, it's actually 25 bytes for

    the object header and two bytes for the two characters.


    sys.getsizeof('a\U0001d11e')
    48


    And here you have four bytes each for the two characters and a 40 byte

    header. Observe:



    py> c = '\U0001d11e'

    py> len(c)

    1

    py> sys.getsizeof(2*c) - sys.getsizeof(c)

    4

    py> sys.getsizeof(1000*c) - sys.getsizeof(999*c)

    4





    How big is the object overhead on a (say) thousand character string? Just

    one percent:



    py> (sys.getsizeof(1000*c) - 4000)/4000

    0.01



    --------


    Sure, the new phone "xyz" does not cost 600$, it only cost
    only 100$ more than the "abc" 500$ phone model.




    If you wish to count the the frequency of chars in a text
    and store the results in a dict, {char: number_of_that_char, ...},
    do not forget to save the key in utf-XXX, it saves memory.


    After all, it is much more funny to waste its time in coding
    and in attempting to handle unicode properly and to observe
    this poor Python wasting its time in conversions.

    sys.getsizeof('a')
    26
    sys.getsizeof('\U0001d11e')
    44
    sys.getsizeof('\U0001d11e'.encode('utf-32'))
    25




    Hint: If you attempt to do the same exercise with
    words in a "latin" text, never forget the length average
    of a word is approximatively 1000 chars.


    jmf
  • Chris Angelico at Nov 9, 2013 at 8:26 am

    On Sat, Nov 9, 2013 at 7:14 PM, wrote:
    If you wish to count the the frequency of chars in a text
    and store the results in a dict, {char: number_of_that_char, ...},
    do not forget to save the key in utf-XXX, it saves memory.

    Oh, if you're that concerned about memory usage of individual
    characters, try storing them as integers:

    sys.getsizeof("a")
    26
    sys.getsizeof("a".encode("utf-32"))
    25
    sys.getsizeof("a".encode("utf-8"))
    18
    sys.getsizeof(ord("a"))
    14


    I really don't see that UTF-32 is much advantage here. UTF-8 happens
    to be, because I used an ASCII character, but the integer beats them
    all, even for larger numbers:
    sys.getsizeof(ord("\U0001d11e"))
    16


    And there's even less difference on my Linux box, but of course, you
    never compare against Linux because Python 3.2 wide builds don't suit
    your numbers.


    For longer strings, there's an even more efficient way to store them.
    Just store the memory address - that's going to be 4 bytes or 8,
    depending on whether it's a 32-bit or 64-bit build of Python. There's
    a name for this method of comparing strings: interning. Some languages
    do it automatically for all strings, others (like Python) only when
    you ask for it. Suddenly it doesn't matter at all what the storage
    format is - if the two strings are the same, their addresses are the
    same, and conversely. That's how to make it cheap.

    Hint: If you attempt to do the same exercise with
    words in a "latin" text, never forget the length average
    of a word is approximatively 1000 chars.

    I think you're confusing length of word with value of picture.


    ChrisA
  • Mark Lawrence at Nov 9, 2013 at 10:13 am
    On 09/11/2013 08:14, wxjmfauth at gmail.com wrote:


    I'll ask again, please don't send us double spaced google crap.


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Roy Smith at Nov 9, 2013 at 2:37 pm
    In article <mailman.2283.1383985583.18130.python-list@python.org>,
      Chris Angelico wrote:

    Some languages [intern] automatically for all strings, others
    (like Python) only when you ask for it.

    What does "only when you ask for it" mean?
  • Chris Angelico at Nov 9, 2013 at 3:02 pm

    On Sun, Nov 10, 2013 at 1:37 AM, Roy Smith wrote:
    In article <mailman.2283.1383985583.18130.python-list@python.org>,
    Chris Angelico wrote:
    Some languages [intern] automatically for all strings, others
    (like Python) only when you ask for it.
    What does "only when you ask for it" mean?

    You can explicitly intern a Python string with the sys.intern()
    function, which returns either the string itself or an
    indistinguishable "interned" string. Two equal strings, when interned,
    will return the same object:

    foo = "asdf"
    bar = "as"
    bar += "df"
    foo is bar
    False


    Note that the Python interpreter is free to answer True there, but
    there's no mandate for it.

    foo = sys.intern(foo)
    bar = sys.intern(bar)
    foo is bar
    True


    Now it's mandated. The two strings must be the same object. Interning
    in this way makes string equality come down to an 'is' check, which is
    potentially a lot faster than actual string equality.


    Some languages (eg Pike) do this automatically with all strings - the
    construction of any string includes checking to see if it's a
    duplicate of any other string. This adds cost to string manipulation
    and speeds up string comparisons; since the engine knows that all
    strings are interned, it can do the equivalent of an 'is' check for
    any string equality.


    So what I meant, in terms of storage/representation efficiency, is
    that you can store duplicate strings very efficiently if you simply
    increment the reference counts of the same few objects. Python won't
    necessarily do that for you; check memory usage of something like
    this:


    strings = [open("some_big_file").read() for _ in range(10000)]


    And compare against this:


    strings = [sys.intern(open("some_big_file").read()) for _ in range(10000)]


    In a language that guarantees string interning, the syntax of the
    former would have the memory consumption of the latter. Whether that
    memory saving and improved equality comparison is worth the effort of
    dictionarification is one of those eternally-debatable points.


    ChrisA
  • Roy Smith at Nov 9, 2013 at 3:21 pm
    In article <mailman.2298.1384009376.18130.python-list@python.org>,
      Chris Angelico wrote:

    On Sun, Nov 10, 2013 at 1:37 AM, Roy Smith wrote:
    In article <mailman.2283.1383985583.18130.python-list@python.org>,
    Chris Angelico wrote:
    Some languages [intern] automatically for all strings, others
    (like Python) only when you ask for it.
    What does "only when you ask for it" mean?
    You can explicitly intern a Python string with the sys.intern()
    function
    [long, and good, explanation of interning]

    But, you missed the point of my question. You said that Python does
    this "only when you ask for it". That implies it never interns strings
    if you don't ask for it, which is clearly not true:


    $ python
    Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
    [...]
    x = "foo"
    y = "foo"
    x is y
    True


    I think what you're trying to say is that there are several possible
    interning policies:


    1) Strings are never interned


    2) Strings are always interned


    3) Strings are optionally interned, at the discretion of the
    implementation


    4) The user may force a specific string to be interned by explicitly
    requesting it.


    and that Pike implements #1, while Python implements #3 and #4.
  • Chris Angelico at Nov 9, 2013 at 3:30 pm

    On Sun, Nov 10, 2013 at 2:21 AM, Roy Smith wrote:
    But, you missed the point of my question. You said that Python does
    this "only when you ask for it". That implies it never interns strings
    if you don't ask for it, which is clearly not true:

    $ python
    Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
    [...]
    x = "foo"
    y = "foo"
    x is y
    True

    Ah! Yes, that's true; literals are interned - I forgot that. But
    anything from an external source won't be, hence my example with
    reading in the contents of a file.

    I think what you're trying to say is that there are several possible
    interning policies:

    1) Strings are never interned

    2) Strings are always interned

    3) Strings are optionally interned, at the discretion of the
    implementation

    4) The user may force a specific string to be interned by explicitly
    requesting it.

    and that Pike implements #1, while Python implements #3 and #4.

    Pike implements #2, I presume that was a typo. And yes, the interning
    of literals falls under #3, while sys.intern() gives #4. Use of #1
    would be restricted to languages with mutable strings, I would expect,
    for the same reason that Python tuples might be shared but lists won't
    be.


    ChrisA
  • Roy Smith at Nov 9, 2013 at 3:35 pm
    In article <mailman.2301.1384011026.18130.python-list@python.org>,
      Chris Angelico wrote:

    Pike implements #2, I presume that was a typo.

    Duh. Yes.
  • Steven D'Aprano at Nov 9, 2013 at 3:37 pm

    On Sat, 09 Nov 2013 09:37:54 -0500, Roy Smith wrote:


    In article <mailman.2283.1383985583.18130.python-list@python.org>,
    Chris Angelico wrote:
    Some languages [intern] automatically for all strings, others (like
    Python) only when you ask for it.
    What does "only when you ask for it" mean?

    In Python 2:


    help(intern)




    In Python 3:


    import sys
    help(sys.intern)




    for more info. I think that Chris is wrong about Python "only" interning
    strings if you explicitly ask for it. I recall that Python will (may?)
    automatically intern strings which look like identifiers (e.g. "spam" but
    not "Hello World" or "123abc"). Let's see now:


    # using Python 3.1 on Linux


    py> s = "spam"
    py> t = "spam"
    py> s is t
    True


    but:


    py> z = ''.join(["sp", "am"])
    py> z is s
    False


    However:


    py> u = "123abc"
    py> v = "123abc"
    py> u is v
    True


    Hmmm, obviously the rules are a tad more complicated than I thought... in
    any case, you shouldn't rely on automatic interning since it is an
    implementation dependent optimization and will probably change without
    notice.






    --
    Steven
  • Chris Angelico at Nov 9, 2013 at 10:14 pm

    On Sun, Nov 10, 2013 at 2:37 AM, Steven D'Aprano wrote:
    I think that Chris is wrong about Python "only" interning
    strings if you explicitly ask for it. I recall that Python will (may?)
    automatically intern strings which look like identifiers (e.g. "spam" but
    not "Hello World" or "123abc").

    I'm pretty sure it's simply that literals are interned, or at least
    shared across a module (and the interactive interpreter "counts" as a
    module). And it might still only be ones which look like identifiers,
    because:

    foo = "lorem ipsum dolor sit amet"
    bar = "lorem ipsum dolor sit amet"
    foo is bar
    False


    My "only" was false because of the sharing/interning of (some)
    literals, which I'd forgotten about; however, there's still the
    distinction that I was trying to draw, that in Python _some strings_
    are interned (a feature you can explicitly request), rather than _all
    strings_ being interned. And as is typical of python-list, it's this
    extremely minor point that became the new course of the thread - my
    main point was not whether all, some, or no strings get interned, but
    that string interning makes the storage space of duplicate strings
    immaterial :)


    ChrisA
  • Steven D'Aprano at Nov 10, 2013 at 6:39 am

    On Sun, 10 Nov 2013 09:14:28 +1100, Chris Angelico wrote:


    And
    as is typical of python-list, it's this extremely minor point that
    became the new course of the thread -

    You say that as if it were a bad thing :-P



    my main point was not whether all,
    some, or no strings get interned, but that string interning makes the
    storage space of duplicate strings immaterial :)

    True. It's not just a memory saver[1], but a time saver too. Using Python
    3.3:


    py> from timeit import Timer
    py> t1 = Timer('s == t', setup='s = "a b"*10000; t = "a b"*10000')
    py> t2 = Timer('s == t',
    ... setup='from sys import intern; s = intern("a b"*10000); '
    ... 't = intern("a b"*10000)')
    py> min(t1.repeat(number0000))
    7.651959054172039
    py> min(t2.repeat(number0000))
    0.00881262868642807




    String equality does a short-cut of checking for identity; if the strings
    are interned, they will be identical.






    [1] Assuming that you actually do have duplicate strings. If every string
    is unique, interning them potentially wastes memory.






    --
    Steven
  • Chris Angelico at Nov 10, 2013 at 8:46 am

    On Sun, Nov 10, 2013 at 5:39 PM, Steven D'Aprano wrote:
    On Sun, 10 Nov 2013 09:14:28 +1100, Chris Angelico wrote:

    And
    as is typical of python-list, it's this extremely minor point that
    became the new course of the thread -
    You say that as if it were a bad thing :-P

    More a curiosity than a bad thing.


    ChrisA
  • Steven D'Aprano at Nov 9, 2013 at 12:54 am

    On Fri, 08 Nov 2013 12:48:12 -0500, Roy Smith wrote:


    I have a long string (several Mbytes). I want to iterate over it in
    manageable chunks (say, 1 kbyte each). For (a small) example, if I
    started with "this is a very long string", and I wanted 10 character
    chunks, I should get:

    "this is a "
    "very long "
    "string"

    This seems like something itertools would do, but I don't see anything.
    Is there something, or do I just need to loop and slice (and worry about
    getting all the edge conditions right) myself?

    What edge conditions? Should be trivially easy to loop and slice:


    def grouper(string, size):
         i = 0
         while i <= len(string):
             yield string[i:i+size]
             i += size




    But if you prefer, there is a recipe in the itertools documentation to
    solve this problem for you:


    http://docs.python.org/2/library/itertools.html#recipes


    It's short enough to reproduce here.


    from itertools import izip_longest
    def grouper(iterable, n, fillvalue=None):
         "Collect data into fixed-length chunks or blocks"
         # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
         args = [iter(iterable)] * n
         return izip_longest(fillvalue=fillvalue, *args)


    grouper(your_string, 10, '')


    ought to give you the results you want.




    I expect (but haven't tested) that for strings, the slice version will be
    faster.




    --
    Steven

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 8, '13 at 5:48p
activeNov 10, '13 at 8:46a
posts28
users10
websitepython.org

People

Translate

site design / logo © 2022 Grokbase