FAQ
Maybe not the best name, but it somehow describes what's going on...
So...

I've noticed that I use the following in several contexts:

chunk = []
for element in iterable:
if isSeparator(element) and chunk:
doSomething(chunk)
chunk = []
if chunk:
doSomething(chunk)
chunk = []

If the iterable above is a file, isSeparator(element) is simply
defined as not element.strip() and doSomething(chunk) is
yield(''.join(chunk)) you have a paragraph splitter. I've been using
the same approach for slightly more complicated parsing recently.

However, the extra check at the end (i.e. the duplication) is a bit
ugly. A solution would be:

...
for element in iterable + separator:
...

but that isn't possible, of course. (It could be possible with some
fiddling with itertools etc., I guess.)

If it were possible to check whether the iterator extracted from the
iterable was at an end, that could help too -- but I see no elegant
way of doing it.

I can't really see any good way of using the while/break idiom either,
without resorting to explicit iterator pumping and a Boolean flag
(which isn't really all that elegant...):

it = iter(iterable)
chunk = []
done = False
while not done:
try:
element = it.next()
except StopIteration:
done = True
element = SomeSeparator()
if isSeparator(element) and chunk:
doSomething(chunk)
chunk = []

This seems far too wordy and clunky.

An alternative is:

it = iter(iterable)
chunk = []
while True:
try:
try:
element = it.next()
except StopIteration:
element = SomeSeparator()
break
finally:
if isSeparator(element) and chunk:
doSomething(chunk)
chunk = []

But this stuff is really just as bad (or even quite a bit worse) than
the version with duplication.

I just thought I'd hear if someone can think of a more elegant way of
handling this sort of thing?

--
Magnus Lie Hetland "Nothing shocks me. I'm a scientist."
http://hetland.org -- Indiana Jones

Search Discussions

  • Tim Peters at Mar 23, 2003 at 7:19 pm
    [Magnus Lie Hetland]
    Maybe not the best name, but it somehow describes what's going on...
    So...

    I've noticed that I use the following in several contexts:

    chunk = []
    for element in iterable:
    if isSeparator(element) and chunk:
    doSomething(chunk)
    chunk = []
    if chunk:
    doSomething(chunk)
    chunk = []
    Since chunk is initialized to an empty list, the if clause in the loop can
    never evaluate to true, so this is equivalent to

    chunk = []
    for element in iterable:
    isSeparator(element)

    All variations of the code later in the msg suffer the same problem. As a
    result, I've got no idea what you intend the code to do. Does calling
    isSeparator(element), or <shudder> the process of iterating over iterable,
    mutate chunk as a side effect? If so, "yuck" comes to mind.

    If the code made sense <wink>, something like

    def terminated_iterator(iterable, a_seperator):
    for element in iterable:
    yield element
    yield a_separator

    would produce the original sequence, then tack a_separator on to the end.
    ...
    However, the extra check at the end (i.e. the duplication) is a bit
    ugly. A solution would be:

    ...
    for element in iterable + separator:
    ...

    but that isn't possible, of course. (It could be possible with some
    fiddling with itertools etc., I guess.)
    WRT the preceding,

    for element in terminated_iterator(iterable, seperator):

    gets that effect. More generally,

    def concat(*seqs):
    "Generate all the elements of all the argument iterables."
    for seq in seqs:
    for x in seq:
    yield x

    and then, e.g.,

    for element in concat(iterable, [seperator]):
    ...
    An alternative is:

    it = iter(iterable)
    chunk = []
    while True:
    try:
    try:
    element = it.next()
    except StopIteration:
    element = SomeSeparator()
    break
    finally:
    if isSeparator(element) and chunk:
    doSomething(chunk)
    chunk = []

    But this stuff is really just as bad (or even quite a bit worse) than
    the version with duplication.
    Indeed, stick to sane alternatives.
  • Magnus Lie Hetland at Mar 23, 2003 at 11:51 pm

    In article <mailman.1048447384.30411.python-list at python.org>, Tim Peters wrote:
    [Magnus Lie Hetland]
    Maybe not the best name, but it somehow describes what's going on...
    So...

    I've noticed that I use the following in several contexts:
    A little fix...
    chunk = []
    for element in iterable:
    if isSeparator(element):
    if chunk:
    doSomething(chunk)
    chunk = []
    else:
    chunk.append(element)
    if chunk:
    doSomething(chunk)
    chunk = []
    The original was written in a hurry :]
    Since chunk is initialized to an empty list, the if clause in the loop can
    never evaluate to true, so this is equivalent to

    chunk = []
    for element in iterable:
    isSeparator(element) Yup.
    All variations of the code later in the msg suffer the same problem. As a
    result, I've got no idea what you intend the code to do. Does calling
    isSeparator(element), or <shudder> the process of iterating over iterable,
    mutate chunk as a side effect? If so, "yuck" comes to mind.
    No, sorry -- I just forgot parts of the code :)
    If the code made sense <wink>, something like

    def terminated_iterator(iterable, a_seperator):
    for element in iterable:
    yield element
    yield a_separator

    would produce the original sequence, then tack a_separator on to the end.
    Yes, that's what I've done before (e.g. in an example in my book).
    Maybe that is the best way of doing it.

    [snip]
    WRT the preceding,

    for element in terminated_iterator(iterable, seperator):

    gets that effect. Indeed.
    More generally,

    def concat(*seqs):
    "Generate all the elements of all the argument iterables."
    for seq in seqs:
    for x in seq:
    yield x

    and then, e.g.,

    for element in concat(iterable, [seperator]):
    Yes. I posted something similar to that when discussing itertools
    previously. I guess I was (now) mainly looking for some basic use of
    control structures that I had overlooked.

    Anyway, thanks for the input.

    --
    Magnus Lie Hetland "Nothing shocks me. I'm a scientist."
    http://hetland.org -- Indiana Jones
  • Jeremy Fincher at Mar 24, 2003 at 7:04 am
    mlh at furu.idi.ntnu.no (Magnus Lie Hetland) wrote in message news:<slrnb7ruo9.4ip.mlh at furu.idi.ntnu.no>...
    I've noticed that I use the following in several contexts:

    chunk = []
    for element in iterable:
    if isSeparator(element) and chunk:
    doSomething(chunk)
    chunk = []
    if chunk:
    doSomething(chunk)
    chunk = []

    If the iterable above is a file, isSeparator(element) is simply
    defined as not element.strip() and doSomething(chunk) is
    yield(''.join(chunk)) you have a paragraph splitter. I've been using
    the same approach for slightly more complicated parsing recently.
    Maybe something like this can work?

    def itersplit(iterable, isSeparator):
    acc = []
    for element in iterable:
    if isSeparator(element):
    yield acc
    acc = []
    else:
    acc.append(element)
    yield acc


    Then your paragraph splitter might look like this:

    def paragraphSplitter(file):
    for L in itersplit(file, lambda s: not s.split()):
    yield ''.join(L)

    Jeremy
  • Magnus Lie Hetland at Mar 25, 2003 at 12:22 am
    In article <698f09f8.0303232304.608bc8cf at posting.google.com>, Jeremy
    Fincher wrote:
    Maybe something like this can work? [snip]
    def itersplit(iterable, isSeparator):
    acc = []
    for element in iterable:
    if isSeparator(element):
    yield acc
    acc = []
    else:
    acc.append(element)
    yield acc
    You should add "if acc" before you yield acc -- I don't want an empty
    acc (that only means several separators in a row -- which amounts to a
    single separator in my case). And, with that statement in place, you'd
    get the same duplication as before, as far as I can see. What is new
    about this (except putting it inside a generator)?

    Thanks for the input, though.

    --
    Magnus Lie Hetland "Nothing shocks me. I'm a scientist."
    http://hetland.org -- Indiana Jones
  • Jeremy Fincher at Mar 25, 2003 at 7:26 am
    mlh at furu.idi.ntnu.no (Magnus Lie Hetland) wrote in message news:<slrnb7v8ai.bop.mlh at furu.idi.ntnu.no>...
    You should add "if acc" before you yield acc -- I don't want an empty
    acc (that only means several separators in a row -- which amounts to a
    single separator in my case).
    That makes sense. To be truly general, that should be a named
    argument with a default to not return empty values.

    And, with that statement in place, you'd
    get the same duplication as before, as far as I can see. What is new
    about this (except putting it inside a generator)?
    Simply that once writing it, the ugliness is contained in that one
    function, and all your code that needs the behavior you describe can
    be written much more beautifully :)

    Jeremy
  • Magnus Lie Hetland at Mar 25, 2003 at 2:07 pm
    In article <698f09f8.0303242326.60c2f3b5 at posting.google.com>, Jeremy
    Fincher wrote:
    mlh at furu.idi.ntnu.no (Magnus Lie Hetland) wrote in message news:<slrnb7v8ai.bop.mlh at furu.idi.ntnu.no>...
    You should add "if acc" before you yield acc -- I don't want an empty
    acc (that only means several separators in a row -- which amounts to a
    single separator in my case).
    That makes sense. To be truly general, that should be a named
    argument with a default to not return empty values.
    Maybe. I wasn't really looking for a general function/generator, but
    for an idiom (i.e. a way of solving this with basic tools). An
    unrealistic wish, perhaps :)
    And, with that statement in place, you'd
    get the same duplication as before, as far as I can see. What is new
    about this (except putting it inside a generator)?
    Simply that once writing it, the ugliness is contained in that one
    function, and all your code that needs the behavior you describe can
    be written much more beautifully :) Indeed.
    Jeremy
    --
    Magnus Lie Hetland "Nothing shocks me. I'm a scientist."
    http://hetland.org -- Indiana Jones
  • Alex Martelli at Mar 24, 2003 at 11:57 am
    Magnus Lie Hetland wrote:

    Ah, I recognize the outline of our joint contribution to the
    printed Cookbook (recipe 4.8...).
    I've noticed that I use the following in several contexts:
    [fixing as per followups]
    chunk = []
    for element in iterable:
    if isSeparator(element) and chunk:
    doSomething(chunk)
    chunk = []
    else: chunk.append(element)
    if chunk:
    doSomething(chunk)
    chunk = []
    First refactoring that comes to mind is:

    def maydosomething(chunk):
    if chunk:
    doSomething(chunk)
    chunk[:] = []

    chunk = []
    for element in iterable:
    if isSeparator(element): maydosomething(chunk)
    else: chunk.append(element)
    maydosomething(chunk)

    but this wouldn't work for the specific use case you require:
    If the iterable above is a file, isSeparator(element) is simply
    defined as not element.strip() and doSomething(chunk) is
    yield(''.join(chunk)) you have a paragraph splitter. I've been using
    i.e., factoring out a *yield* to maydosomething would NOT work.
    So I'll focus on the specific case of yield in the following,
    assuming a "munge" function such as
    def munge(chunk): return ''.join(chunk)
    is also passed as an argument.

    for element in iterable + separator:
    ...

    but that isn't possible, of course. (It could be possible with some
    fiddling with itertools etc., I guess.)
    Indeed, there ain't much "fiddling" needed at all -- you just
    DO need to know SOME acceptable separator, however:

    import itertools

    def chunkitup(iterable, isSeparator, aSeparator, munge=''.join):

    # a sanity check never hurts...
    assert isSeparator(aSeparator)

    chunk = []
    for element in itertools.chain(iterable, [aSeparator]):
    if isSeparator(element):
    yield munge(chunk)
    chunk = []
    else: chunk.append(element)
    If it were possible to check whether the iterator extracted from the
    iterable was at an end, that could help too -- but I see no elegant
    way of doing it.
    Elegance is in the eye of the beholder, but...:

    class iter_with_lookahead:
    def __init__(self, iterable):
    self.it = iter(iterable)
    self.done = False
    self.step()
    def __iter__(self):
    return self
    def step(self):
    try:
    self.lookahead = self.it.next()
    except StopIteration:
    self.done = True
    def next(self):
    if self.done: raise StopIteration
    result = self.lookahead
    self.step()
    return result

    ...I've had occasion to use variants of this in order to be able
    to peek ahead, check if an iterator was done, or in small further
    variants to give an iterator one level of "pushback", etc, etc.
    So, if you have a wrapper such as this one around somewhere, you
    might choose to reuse it (though it probably wouldn't be worth
    developing for the sole purpose of this use!-):

    def chunkitup1(iterable, isSeparator, munge=''.join):
    chunk = []
    it = iter_with_lookahead(iterable)
    for element in it:
    issep = isSeparator(element)
    if not issep:
    chunk.append(element)
    if issep or it.done:
    yield munge(chunk)
    chunk = []
    I can't really see any good way of using the while/break idiom either,
    Well, you COULD use a different wrapper class to obtain code such as:

    def chunkitup2(iterable, isSeparator, munge=''.join):
    wit = wild_thing(iterable, isSeparator)
    while wit:
    if wit.isSeparator() and wit.hasChunk():
    yield munge(wit.getChunk())

    but the wrapper wouldn't be all that nice under the covers AND it
    would in practice have to embody a bit too much of the control
    logic and bury it in a non-obvious place -- so I wouldn't pursue
    this tack, myself.


    Alex
  • Magnus Lie Hetland at Mar 25, 2003 at 12:36 am

    In article <YoCfa.960$i26.18361 at news2.tin.it>, Alex Martelli wrote:
    Magnus Lie Hetland wrote:

    Ah, I recognize the outline of our joint contribution to the
    printed Cookbook (recipe 4.8...).
    :)

    I use this sort of thing in the "Instant Markup" chapter of my book as
    well. But I haven't found a nice solution, except artificially tucking
    an extra separator onto the iterator. Maybe that's nice enough,
    though...
    I've noticed that I use the following in several contexts:
    [fixing as per followups]
    Good :)

    [snip]
    First refactoring that comes to mind is:
    Yes. I thought of refactoring as a solution too. It does reduce the
    duplication to a duplicated function call -- which may be good enough.
    I just sort of hoped I could avoid it altogether :]
    but this wouldn't work for the specific use case you require:
    If the iterable above is a file, isSeparator(element) is simply
    defined as not element.strip() and doSomething(chunk) is
    yield(''.join(chunk)) you have a paragraph splitter. I've been using
    i.e., factoring out a *yield* to maydosomething would NOT work.
    Now -- the yield isn't important. I could very well update a list
    instead. (Although it would be nice if the yield-thing were possible
    too...)
    So I'll focus on the specific case of yield in the following,
    assuming a "munge" function such as
    def munge(chunk): return ''.join(chunk)
    is also passed as an argument.
    OK.

    [snip]
    Indeed, there ain't much "fiddling" needed at all
    But some, though ;)
    -- you just
    DO need to know SOME acceptable separator, however:
    Hm. Yeah -- that's sort of the thing I don't really like, I guess.
    (It's a pretty vague feeling, though ;)
    import itertools

    for element in itertools.chain(iterable, [aSeparator]):
    Yup. This was discussed separately in another thread, where I
    suggested some related tools to itertools (and got the above as a
    suggested alternative).

    This is, as I mentioned, more or less what I did in the "Instant
    Markup" thing. I simply used something along the lines of

    def lines(file):
    for line in file:
    yield line
    yield '\n'

    and then iterated over that when producing paragraphs.

    [snip]
    Elegance is in the eye of the beholder, but...:
    Indeed. My notion of elegance has been known to be a bit superficial
    at times ;)

    [snip]

    I'm actually using something very similar in another context (where I
    need to know wheter two thingies of the same kind are next to each
    other :)
    ...I've had occasion to use variants of this in order to be able
    to peek ahead, check if an iterator was done, or in small further
    variants to give an iterator one level of "pushback", etc, etc.
    Indeed. Perhaps some simple, general version of this (maybe even like
    the one you described above -- although perhaps with a slightly more
    pithy name? -- might be a candidate for itertools? Lookahead can be
    useful in many cases (such as when writing a parser, for instance :)
    So, if you have a wrapper such as this one around somewhere, you
    might choose to reuse it (though it probably wouldn't be worth
    developing for the sole purpose of this use!-):
    Indeed...

    [snip]
    if issep or it.done:
    Yes -- this is exactly what I'm missing, I suppose.
    I can't really see any good way of using the while/break idiom
    either,
    Well, you COULD use a different wrapper class to obtain code such as:
    Yeah... If I was to write a wrapper class in the first place, I could
    do pretty much anything, I suppose :]
    but the wrapper wouldn't be all that nice under the covers AND it
    would in practice have to embody a bit too much of the control
    logic and bury it in a non-obvious place -- so I wouldn't pursue
    this tack, myself.
    Agreed.

    I guess what it all boils down to is that it would be nice to know
    whether one is in the last iteration of a for loop. Sadly, I see no
    way of doing that in the general case.
    Alex
    --
    Magnus Lie Hetland "Nothing shocks me. I'm a scientist."
    http://hetland.org -- Indiana Jones
  • Manuel Garcia at Mar 25, 2003 at 7:54 pm

    On Sun, 23 Mar 2003 14:19:53 -0500, "Tim Peters" wrote:

    If the code made sense <wink>, something like

    def terminated_iterator(iterable, a_seperator):
    for element in iterable:
    yield element
    yield a_separator

    would produce the original sequence, then tack a_separator on to the end.
    Isn't it a general rule that terminators are easier to work with than
    separators? I remember some programming guru saying this (Jon
    Bentley?) I think it was Pascal's use of separators between
    statements that convinced Dennis Ritchie to use terminators instead.

    When I have to deal with separators, I always tack an extra one on the
    end, using a trick like the one above, or a simple append or
    concatenation. This is usually good to make a boolean or repeated
    code vanish. For string processing, I usually throw an extra one on
    the front too, for good luck.

    Separator is also harder to spell than terminator. ;-)

    Manuel
  • Daniel Timothy Bentley at Mar 25, 2003 at 10:54 pm

    On Tue, 25 Mar 2003, Manuel Garcia wrote:

    Isn't it a general rule that terminators are easier to work with than
    separators? I remember some programming guru saying this (Jon
    Bentley?) I think it was Pascal's use of separators between
    Doubtful. He can't remember such a thing.
    ObOldPersonMemoryDisclaimer

    -Dan
    Manuel
  • Manuel M Garcia at Mar 26, 2003 at 6:14 am

    On Tue, 25 Mar 2003 14:54:20 -0800, Daniel Timothy Bentley wrote:
    Isn't it a general rule that terminators are easier to work with than
    separators? I remember some programming guru saying this (Jon
    Bentley?) I think it was Pascal's use of separators between
    Doubtful. He can't remember such a thing.
    ObOldPersonMemoryDisclaimer
    I guess it could be considered a special case of adding a sentinel at
    a boundary (5.1 in 'Writing Efficient Programs'). But that section is
    definitely not talking specifically about terminators vs. separators.

    Maybe because I have done so much 'multi-dimensional' database
    programming, I am acutely aware how separators make for lousy data
    structures.

    Manuel

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 23, '03 at 6:20p
activeMar 26, '03 at 6:14a
posts12
users7
websitepython.org

People

Translate

site design / logo © 2022 Grokbase