FAQ
I noticed that if a file is being continuously written to, the file
generator does not notice it:



def getLines(f):
lines = []
for line in f:
lines.append(line)
return lines

with open('/var/log/syslog', 'rb') as f:
lines = getLines(f)
# do some processing with lines
# /var/log/syslog gets updated in the mean time

# always returns an empty list, even though f has more data
lines = getLines(f)




I found a workaround by adding f.seek(0,1) directly before the last
getLines() call, but is this the expected behavior? Calling f.tell()
right after the first getLines() call shows that it isn't reset back to
0. Is this correct or a bug?

--
Bill

Search Discussions

  • Ian Kelly at Jul 14, 2011 at 8:00 pm

    On Thu, Jul 14, 2011 at 1:46 PM, Billy Mays wrote:
    def getLines(f):
    ? ?lines = []
    ? ?for line in f:
    ? ? ? ?lines.append(line)
    ? ?return lines

    with open('/var/log/syslog', 'rb') as f:
    ? ?lines = getLines(f)
    ? ?# do some processing with lines
    ? ?# /var/log/syslog gets updated in the mean time

    ? ?# always returns an empty list, even though f has more data
    ? ?lines = getLines(f)




    I found a workaround by adding f.seek(0,1) directly before the last
    getLines() call, but is this the expected behavior? ?Calling f.tell() right
    after the first getLines() call shows that it isn't reset back to 0. ?Is
    this correct or a bug?
    This is expected. Part of the iterator protocol is that once an
    iterator raises StopIteration, it should continue to raise
    StopIteration on subsequent next() calls.
  • Billy Mays at Jul 14, 2011 at 8:15 pm

    On 07/14/2011 04:00 PM, Ian Kelly wrote:
    On Thu, Jul 14, 2011 at 1:46 PM, Billy Mayswrote:
    def getLines(f):
    lines = []
    for line in f:
    lines.append(line)
    return lines

    with open('/var/log/syslog', 'rb') as f:
    lines = getLines(f)
    # do some processing with lines
    # /var/log/syslog gets updated in the mean time

    # always returns an empty list, even though f has more data
    lines = getLines(f)




    I found a workaround by adding f.seek(0,1) directly before the last
    getLines() call, but is this the expected behavior? Calling f.tell() right
    after the first getLines() call shows that it isn't reset back to 0. Is
    this correct or a bug?
    This is expected. Part of the iterator protocol is that once an
    iterator raises StopIteration, it should continue to raise
    StopIteration on subsequent next() calls.
    Is there any way to just create a new generator that clears its `closed`
    status?

    --
    Bill
  • Hrvoje Niksic at Jul 14, 2011 at 8:39 pm

    Billy Mays <noway at nohow.com> writes:

    Is there any way to just create a new generator that clears its
    closed` status?
    You can define getLines in terms of the readline file method, which does
    return new data when it is available.

    def getLines(f):
    lines = []
    while True:
    line = f.readline()
    if line == '':
    break
    lines.append(line)
    return lines

    or, more succinctly:

    def getLines(f):
    return list(iter(f.readline, ''))
  • Terry Reedy at Jul 14, 2011 at 8:43 pm

    On 7/14/2011 3:46 PM, Billy Mays wrote:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:
    Because it does not look, as Ian explained.
    def getLines(f):
    lines = []
    for line in f:
    lines.append(line)
    return lines
    This nearly duplicates .readlines, except for using f an an iterator.
    Try the following (untested):

    with open('/var/log/syslog', 'rb') as f:
    lines = f.readlines()
    # do some processing with lines
    # /var/log/syslog gets updated in the mean time
    lines = f.readlines()

    People regularly do things like this with readline, so it is possible.
    If above does not work, try (untested):

    def getlines(f):
    lines = []
    while True:
    l = f.readline()
    if l: lines.append(l)
    else: return lines

    --
    Terry Jan Reedy
  • Bruno Desthuilliers at Jul 15, 2011 at 8:01 am

    On Jul 14, 9:46?pm, Billy Mays wrote:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:

    def getLines(f):
    ? ? ?lines = []
    ? ? ?for line in f:
    ? ? ? ? ?lines.append(line)
    ? ? ?return lines
    what's wrong with file.readlines() ?
  • Billy Mays at Jul 15, 2011 at 12:26 pm

    On 07/15/2011 04:01 AM, bruno.desthuilliers at gmail.com wrote:
    On Jul 14, 9:46 pm, Billy Mayswrote:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:

    def getLines(f):
    lines = []
    for line in f:
    lines.append(line)
    return lines
    what's wrong with file.readlines() ?
    Using that will read the entire file into memory which may not be
    possible. In the library reference, it mentions that using the
    generator (which calls file.next()) uses a read ahead buffer to
    efficiently loop over the file. If I call .readline() myself, I forfeit
    that performance gain.

    I was thinking that a convenient solution to this problem would be to
    introduce a new Exception call PauseIteration, which would signal to the
    caller that there is no more data for now, but not to close down the
    generator entirely.

    --
    Bill
  • Thomas Rachel at Jul 15, 2011 at 2:21 pm

    Am 15.07.2011 14:26 schrieb Billy Mays:

    I was thinking that a convenient solution to this problem would be to
    introduce a new Exception call PauseIteration, which would signal to the
    caller that there is no more data for now, but not to close down the
    generator entirely.
    Alas, an exception thrown causes the generator to stop.


    Thomas
  • Terry Reedy at Jul 15, 2011 at 9:45 pm

    On 7/15/2011 8:26 AM, Billy Mays wrote:
    On 07/15/2011 04:01 AM, bruno.desthuilliers at gmail.com wrote:
    On Jul 14, 9:46 pm, Billy Mayswrote:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:

    def getLines(f):
    lines = []
    for line in f:
    lines.append(line)
    return lines
    what's wrong with file.readlines() ?
    Using that will read the entire file into memory which may not be
    So will getLines.
    possible. In the library reference, it mentions that using the generator
    (which calls file.next()) uses a read ahead buffer to efficiently loop
    over the file. If I call .readline() myself, I forfeit that performance
    gain.
    Are you sure? Have you measured the difference?

    --
    Terry Jan Reedy
  • Steven D'Aprano at Jul 16, 2011 at 3:42 am

    Billy Mays wrote:

    I was thinking that a convenient solution to this problem would be to
    introduce a new Exception call PauseIteration, which would signal to the
    caller that there is no more data for now, but not to close down the
    generator entirely.
    It never fails to amuse me how often people consider it "convenient" to add
    new built-in functionality to Python to solve every little issue. As
    pie-in-the-sky wishful-thinking, it can be fun, but people often mean it to
    be taken seriously.

    Okay, we've come up with the solution of a new exception, PauseIteration,
    that the iterator protocol will recognise. Now we have to:

    - write a PEP for it, setting out the case for it;
    - convince the majority of CPython developers that the idea is a good one,
    which might mean writing a proof-of-concept version;
    - avoid having the Jython, IronPython and PyPy developers come back and say
    that it is impossible under their implementations;
    - avoid having Guido veto it;
    - write an implementation or patch adding that functionality;
    - try to ensure it doesn't cause any regressions in the CPython tests;
    - fix the regressions that do occur despite our best efforts;
    - ensure that there are no backwards compatibility issues to be dealt with;
    - write a test suite for it;
    - write documentation for it;
    - unless we're some of the most senior Python developers, have the patch
    reviewed before it is accepted;
    - fix the bugs that have come to light since the first version;
    - make sure copyright is assigned to the Python Software Foundation;
    - wait anything up to a couple of years for the latest version of Python,
    including the patch, to be released as production-ready software;
    - upgrade our own Python installation to use the latest version, if we can
    and aren't forced to stick with an older version

    and now, at long last, we can use this convenient feature in our own code!
    Pretty convenient, yes?

    (If you think I exaggerate, consider the "yield from" construct, which has
    Guido's support and was pretty uncontroversial. Two and a half years later,
    it is now on track to be added to Python 3.3.)

    Or you can look at the various recipes on the Internet for writing tail-like
    file viewers in Python, and solve the problem the boring old fashioned way.
    Here's one that blocks while the file is unchanged:

    http://lethain.com/tailing-in-python/

    Modifying it to be non-blocking should be pretty straightforward -- just add
    a `yield ""` after the `if not line`.



    --
    Steven
  • Chris Angelico at Jul 16, 2011 at 4:07 am

    On Sat, Jul 16, 2011 at 1:42 PM, Steven D'Aprano wrote:
    Okay, we've come up with the solution of a new exception, PauseIteration,
    that the iterator protocol will recognise. Now we have to:

    - write an implementation or patch adding that functionality;

    - and add it to our own personal builds of Python, thus bypassing the
    entire issue of getting it accepted into Python. Of course, this does
    mean that your brilliant code only works on your particular build of
    Python, but I'd say that this is the first step - before writing up
    the PEP, run it yourself and see whether you even like the way it
    feels.

    THEN, once you've convinced yourself, start convincing others (ie PEP).

    ChrisA
  • Cameron Simpson at Jul 16, 2011 at 11:28 pm

    On 16Jul2011 13:42, Steven D'Aprano wrote: | Billy Mays wrote:
    I was thinking that a convenient solution to this problem would be to
    introduce a new Exception call PauseIteration, which would signal to the
    caller that there is no more data for now, but not to close down the
    generator entirely.
    It never fails to amuse me how often people consider it "convenient" to add
    new built-in functionality to Python to solve every little issue. As
    pie-in-the-sky wishful-thinking, it can be fun, but people often mean it to
    be taken seriously.

    Okay, we've come up with the solution of a new exception, PauseIteration,
    that the iterator protocol will recognise.
    One might suggest that Billy could wrp his generator in a Queue(1) and
    use the .empty() test, and/or raise his own PauseIteration from the
    wrapper.
    --
    Cameron Simpson <cs at zip.com.au> DoD#743
    http://www.cskk.ezoshosting.com/cs/

    No team manager will tell you this; but they all want to see you
    come walking back into the pits sometimes, carrying the steering wheel.
    - Mario Andretti
  • Thomas Rachel at Jul 17, 2011 at 7:26 am
    Am 16.07.2011 05:42 schrieb Steven D'Aprano:

    You are right - it is a very big step for a very small functionality.
    Or you can look at the various recipes on the Internet for writing tail-like
    file viewers in Python, and solve the problem the boring old fashioned way.

    It is not only about this "tail-like" thing. There are other legitimate
    use cases for this. I once found myself in the need to have a growing
    list of data to be put into a database. This growth could, on one hand,
    need several minutes to complete, but on the other hand the data should
    be put into the database ASAP, but not too slow. So it was best to put
    on every DB call all data which were present into the DB and iterate
    over this until end of data.

    Then, I wished such a PauseIteration exception as well, but there was
    another, not-to-bad way to do it, so I did it this way (roughly, an
    iterable whose iterator was exhausted if currently no data were present
    and which had a separate method for signalling end of data.

    Roughly:

    while not source.done():
    put_to_db(source)

    where put_to_db() iterates over source and issues the DB query with all
    data available to this point and then starting over.


    Thomas
  • Thomas Rachel at Jul 15, 2011 at 12:39 pm

    Am 14.07.2011 21:46 schrieb Billy Mays:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:
    Yes. That's why there were alternative suggestions in your last thread
    "How to write a file generator".

    To repeat mine: an object which is not an iterator, but an iterable.

    class Follower(object):
    def __init__(self, file):
    self.file = file
    def __iter__(self):
    while True:
    l = self.file.readline()
    if not l: return
    yield l

    if __name__ == '__main__':
    import time
    f = Follower(open("/var/log/messages"))
    while True:
    for i in f: print i,
    print "all read, waiting..."
    time.sleep(4)

    Here, you iterate over the object until it is exhausted, but you can
    iterate again to get the next entries.

    The difference to the file as iterator is, as you have noticed, that
    once an iterator is exhausted, it will be so forever.

    But if you have an iterable, like the Follower above, you can reuse it
    as you want.
  • Billy Mays at Jul 15, 2011 at 12:52 pm

    On 07/15/2011 08:39 AM, Thomas Rachel wrote:
    Am 14.07.2011 21:46 schrieb Billy Mays:
    I noticed that if a file is being continuously written to, the file
    generator does not notice it:
    Yes. That's why there were alternative suggestions in your last thread
    "How to write a file generator".

    To repeat mine: an object which is not an iterator, but an iterable.

    class Follower(object):
    def __init__(self, file):
    self.file = file
    def __iter__(self):
    while True:
    l = self.file.readline()
    if not l: return
    yield l

    if __name__ == '__main__':
    import time
    f = Follower(open("/var/log/messages"))
    while True:
    for i in f: print i,
    print "all read, waiting..."
    time.sleep(4)

    Here, you iterate over the object until it is exhausted, but you can
    iterate again to get the next entries.

    The difference to the file as iterator is, as you have noticed, that
    once an iterator is exhausted, it will be so forever.

    But if you have an iterable, like the Follower above, you can reuse it
    as you want.

    I did see it, but it feels less pythonic than using a generator. I did
    end up using an extra class to get more data from the file, but it seems
    like overhead. Also, in the python docs, file.next() mentions there
    being a performance gain for using the file generator (iterator?) over
    the readline function.

    Really what would be useful is some sort of PauseIteration Exception
    which doesn't close the generator when raised, but indicates to the
    looping header that there is no more data for now.

    --
    Bill
  • Chris Angelico at Jul 15, 2011 at 12:58 pm

    On Fri, Jul 15, 2011 at 10:52 PM, Billy Mays wrote:
    Really what would be useful is some sort of PauseIteration Exception which
    doesn't close the generator when raised, but indicates to the looping header
    that there is no more data for now.
    All you need is a sentinel yielded value (eg None).

    ChrisA
  • Thomas Rachel at Jul 15, 2011 at 2:28 pm

    Am 15.07.2011 14:52 schrieb Billy Mays:

    Also, in the python docs, file.next() mentions there
    being a performance gain for using the file generator (iterator?) over
    the readline function.
    Here, the question is if this performance gain is really relevant AKA
    "feelable". The file object seems to have another internal buffer
    distinct from the one used for iterating used for the readline()
    function. Why this is not the same buffer is unclear to me.

    Really what would be useful is some sort of PauseIteration Exception
    which doesn't close the generator when raised, but indicates to the
    looping header that there is no more data for now.
    a None or other sentinel value would do this as well (as ChrisA already
    said).


    Thomas
  • Billy Mays at Jul 15, 2011 at 2:42 pm

    On 07/15/2011 10:28 AM, Thomas Rachel wrote:
    Am 15.07.2011 14:52 schrieb Billy Mays:
    Also, in the python docs, file.next() mentions there
    being a performance gain for using the file generator (iterator?) over
    the readline function.
    Here, the question is if this performance gain is really relevant AKA
    "feelable". The file object seems to have another internal buffer
    distinct from the one used for iterating used for the readline()
    function. Why this is not the same buffer is unclear to me.

    Really what would be useful is some sort of PauseIteration Exception
    which doesn't close the generator when raised, but indicates to the
    looping header that there is no more data for now.
    a None or other sentinel value would do this as well (as ChrisA already
    said).


    Thomas
    A sentinel does provide a work around, but it also passes the problem
    onto the caller rather than the callee:

    def getLines(f):
    lines = []

    while True:
    yield f.readline()

    def bar(f):
    for line in getLines(f):
    if not line: # I now have to check here instead of in getLines
    break
    foo(line)


    def baz(f):
    for line in getLines(f) if line: # this would be nice for generators
    foo(line)


    bar() is the correct way to do things, but I think baz looks cleaner. I
    found my self writing baz() first, finding it wasn't syntactically
    correct, and then converting it to bar(). The if portion of the loop
    would be nice for generators, since it seems like the proper place for
    the sentinel to be matched. Also, with potentially infinite (but
    pauseable) data, there needs to be a nice way to catch stuff like this.

    --
    Bill
  • Thomas Rachel at Jul 15, 2011 at 8:46 pm

    Am 15.07.2011 16:42 schrieb Billy Mays:

    A sentinel does provide a work around, but it also passes the problem
    onto the caller rather than the callee:
    That is right.


    BTW, there is another, maybe easier way to do this:

    for line in iter(f.readline, ''):
    do_stuff(line)

    This provides an iterator which yields return values from the given
    callable until '' is returned, in which case the iterator stops.

    As caller, you need to have knowledge about the fact that you can always
    continue.

    The functionality which you ask for COULD be accomplished in two ways:

    Firstly, one could simply break the "contract" of an iterator (which
    would be a bad thing): just have your next() raise a StopIteration and
    then continue nevertheless.

    Secondly, one could do a similiar thing and have the next() method raise
    a different exception. Then the caller has as well to know about, but I
    cannot find a passage in the docs which prohibit this.

    I just have tested this:
    def r(x): return x
    def y(x): raise x

    def l(f, x): return lambda: f(x)
    class I(object):
    def __init__(self):
    self.l = [l(r, 1), l(r, 2), l(y, Exception), l(r, 3)]
    def __iter__(self):
    return self
    def next(self):
    if not self.l: raise StopIteration
    c = self.l.pop(0)
    return c()

    i = I()
    try:
    for j in i: print j
    except Exception, e: print "E:", e
    print tuple(i)

    and it works.


    So I think it COULD be ok to do this:

    class NotNow(Exception): pass

    class F(object):
    def __init__(self, f):
    self.file = f
    def __iter__(self):
    return self
    def next(self):
    l = self.file.readline()
    if not l: raise NotNow
    return l

    f = F(file("/var/log/messages"))
    import time
    while True:
    try:
    for i in f: print "", i,
    except NotNow, e:
    print "<pause>"
    time.sleep(1)


    HTH,

    Thomas
  • Ethan Furman at Jul 15, 2011 at 9:20 pm

    Billy Mays wrote:
    A sentinel does provide a work around, but it also passes the problem
    onto the caller rather than the callee
    The callee can easily take care of it -- just block until more is ready.
    If blocking is not an option, then the caller has to deal with it no
    matter how callee is implemented -- an exception, a sentinel, or some
    signal that says "nope, nothing for ya! try back later!"

    ~Ethan~
  • Terry Reedy at Jul 15, 2011 at 9:47 pm

    On 7/15/2011 10:42 AM, Billy Mays wrote:
    On 07/15/2011 10:28 AM, Thomas Rachel wrote:
    Am 15.07.2011 14:52 schrieb Billy Mays:
    Really what would be useful is some sort of PauseIteration Exception
    which doesn't close the generator when raised, but indicates to the
    looping header that there is no more data for now.
    a None or other sentinel value would do this as well (as ChrisA already
    said).
    A sentinel does provide a work around, but it also passes the problem
    onto the caller rather than the callee:
    No more so than a new exception that the caller has to recognize.

    --
    Terry Jan Reedy

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 14, '11 at 7:46p
activeJul 17, '11 at 7:26a
posts21
users11
websitepython.org

People

Translate

site design / logo © 2022 Grokbase