FAQ
It is possible to use the regular expression engine with other string-like
structures?.

For example, I can have my proper string class implemented as a list of
python strings. Of course, my class would have an operator for direct access
to a character by index. (I think nothing more is necessary).

I think that the new java sdk beta 1.4 includes such as regular expression
engine.

Search Discussions

  • Carlos Gaston Alvarez at Sep 19, 2001 at 7:20 am
    I think that I dont understand the question.
    Why dont you implement a str() o toString() and use de re engine over the
    resulting string?

    Chau,

    Gaston


    ----- Original Message -----
    From: "Juan Vali?o" <juan.vali at terra.es>
    Newsgroups: comp.lang.python
    To: <python-list at python.org>
    Sent: Wednesday, September 19, 2001 12:16 AM
    Subject: regular expressions with other string classes

    It is possible to use the regular expression engine with other string-like
    structures?.

    For example, I can have my proper string class implemented as a list of
    python strings. Of course, my class would have an operator for direct access
    to a character by index. (I think nothing more is necessary).

    I think that the new java sdk beta 1.4 includes such as regular expression
    engine.


    --
    http://mail.python.org/mailman/listinfo/python-list
  • Andrew Dalke at Sep 19, 2001 at 7:25 am
    Juan Vali?o:
    It is possible to use the regular expression engine with other string-like
    structures?.
    Sometimes. They work on 'array's

    Python 2.2a3+ (#6, Sep 17 2001, 05:05:24)
    [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    import re
    pat = re.compile("^(.*(?=dalke).*)$", re.MULTILINE)
    import array
    s = array.array("c", open("/etc/passwd").read())
    m = pat.search(s)
    m
    <SRE_Match object at 0x1202d1430>
    m.group(1)
    array('c', 'dalke:*:100:100:Andrew Dalke:/home/dalke:/bin/tcsh')
    >>>

    but they don't seem to work with memory mapped files. (Hard to
    tell, since I always get confused working with the mmap file.
    Indeed, just caused a bus error on Linux playing around with this.
    Now to report the problem on sf.)

    Andrew
    dalke at dalkescientific.com
  • Nicholas FitzRoy-Dale at Sep 19, 2001 at 7:55 am

    On 19 Sep, Andrew Dalke wrote:
    Juan Vali?o:
    It is possible to use the regular expression engine with other string-like
    structures?.
    Sometimes. They work on 'array's

    Python 2.2a3+ (#6, Sep 17 2001, 05:05:24)
    [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    import re
    pat = re.compile("^(.*(?=dalke).*)$", re.MULTILINE)
    import array
    s = array.array("c", open("/etc/passwd").read())
    m = pat.search(s)
    m
    <SRE_Match object at 0x1202d1430>
    m.group(1)
    array('c', 'dalke:*:100:100:Andrew Dalke:/home/dalke:/bin/tcsh')
    but they don't seem to work with memory mapped files.
    I'm doing regular expression search on an MMAPed file without issues.
  • Fredrik Lundh at Sep 19, 2001 at 9:52 am

    Juan Vali?o wrote:
    It is possible to use the regular expression engine with other string-like
    structures?.
    regular expressions can be used on anything that implements
    the buffer low-level API (single segment read, with an element
    size equal to 1 or sizeof(Py_UNICODE)).

    if that doesn't make any sense to you, the answer is "no" ;-)

    </F>
  • Andrew Dalke at Sep 19, 2001 at 10:19 am
    [Note: proposed change to mmap.mmap at the end of this message.]

    Nicholas FitzRoy-Dale wrote
    I'm doing regular expression search on an MMAPed file without issues.
    From code to index an mbox-style file:
    mboxMap = mmap.mmap (handle.fileno(), getFileLength (self.sourceFilename),
    mmap.MAP_SHARED, mmap.PROT_READ)
    Well, there's my problem. I've nearly no clue on how to use
    the mmap module, so I assumed I could use the defaults.

    Just tried out what you did, and it works.

    The following doesn't work:

    $ ./python
    Python 2.2a3+ (#7, Sep 19 2001, 03:31:19)
    [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    handle = open("/etc/passwd", "r")
    import mmap
    map = mmap.mmap(handle.fileno(), 0)
    import re
    pat = re.compile("^(.*(?=dalke).*)$", re.MULTILINE)
    m = pat.search(map)
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    TypeError: expected string or buffer
    >>>

    In this case, the mmap call is missing the MAP_SHARED flag
    (which is the default, so should be fine). It's also missing
    the PROT_READ flag, so the default of PROT_READ | PROT_WRITE
    is used.

    I guess that's the problem, but you can see why the error message
    threw me off.



    Would it be useful if mmap.mmap was changed to something like

    def proposed_mmap(file, size = 0, flags = mmap.MAP_SHARED,
    prot = None):
    # Roughly like PyObject_AsFileDescriptor in Objects/fileobject.c
    if hasattr(file, "fileno"):
    fileno = file.fileno()
    else:
    fileno = file

    # See if we need to figure out the default for "prot"
    if prot is None:
    # File-like object may have a "mode" attribute defined
    # If so, use it, otherwise default to "rw"
    mode = getattr(file, "mode", "rw")
    prot = 0
    if "r" in mode:
    prot |= mmap.PROT_READ
    if "w" in mode:
    prot |= mmap.PROT_WRITE
    if prot == 0:
    prot = mmap.PROT_NONE

    return mmap.mmap(fileno, size, flags, prot)

    This would allow people like me to do

    handle = open("filename")
    map = mmap.mmap(handle)

    and have it just work. (Unless I do 'mmap.mmap(open("filename"))'
    since then I'll lose a reference count and the file handle gets
    closed from underneath me. I think.)

    Comments?

    Andrew
    dalke at dalkescientific.com
  • Nicholas FitzRoy-Dale at Sep 19, 2001 at 1:43 pm

    On 19 Sep, Andrew Dalke wrote:

    Nicholas FitzRoy-Dale wrote
    mboxMap = mmap.mmap (handle.fileno(), getFileLength (self.sourceFilename),
    mmap.MAP_SHARED, mmap.PROT_READ)
    Well, there's my problem. I've nearly no clue on how to use
    the mmap module, so I assumed I could use the defaults.
    Ah, right.
    Just tried out what you did, and it works.

    The following doesn't work: <snip>
    In this case, the mmap call is missing the MAP_SHARED flag
    (which is the default, so should be fine). It's also missing
    the PROT_READ flag, so the default of PROT_READ | PROT_WRITE
    is used.

    I guess that's the problem, but you can see why the error message
    threw me off.
    Actually, I just did a bit of testing, and it seems that it's the file
    length param that's causing that error - supplying 0 doesn't work. I
    initially supplied a very large file length and predictably got sigsegv
    or something when the re ran off the end of the file :-) so I actually
    supplied the correct length and it worked.

    I agree that "expected string or buffer" doesn't come close to
    indicating "you've got a zero-length mmap, fix!" though. :)

    --
    - Nicholas FitzRoy-Dale
    http://www.lardcave.net

    /. signature:
    "Hit a man on the head with a fish, and he'll have a headache for a day..."
  • Andrew Dalke at Sep 19, 2001 at 6:52 pm
    Nicholas FitzRoy-Dale:
    Actually, I just did a bit of testing, and it seems that it's the file
    length param that's causing that error - supplying 0 doesn't work. I
    initially supplied a very large file length and predictably got sigsegv
    or something when the re ran off the end of the file :-) so I actually
    supplied the correct length and it worked.
    I used 0 since the docs say
    If length is 0, the maximum length of the map will be the
    current size of the file when mmap() is called.
    but upon rereading the docs I see that's Windows specific, and
    it says nothing about what 0 means when passed to the Unix call.

    On unix, the implementation passes the size value, unmodified,
    to the mmap(2) function. The man page function doesn't mention
    anything special about how it treats 0 size.

    So I yet again misunderstood how the mmap module is supposed to
    be used. :(

    I believe it needs better documentation. I believe I don't
    know enough about mmap to provide that documentation.

    Andrew
    dalke at dalkescientific.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 18, '01 at 10:16p
activeSep 19, '01 at 6:52p
posts8
users5
websitepython.org

People

Translate

site design / logo © 2023 Grokbase