FAQ
Hello :-)

Some time ago I have written one of the zillion programs that read some
kind of format file(s) and make HTML from them. The program is able to
convert <<I italic text>> to <I>italic text</I> or <<LINK link;text>> to
<A HREF="link">text</A>.

However, currently I can't convert <<LINK link;<<I italic text>>>> to
<A HREF="link"><I>italic text</I></A>. The relevant code is

-----8<---------------------------------------------------------------

######################################################################
# perform substitutions
# <<link url;url_text>>, url_text defaults to url
link_pattern = re.compile( '(?si)<<link (.+?)(?:;(.*?))?>>' )

def make_link( matchobj ):

url, url_text = matchobj.groups()
if not url_text: # use url as url_text by default
url_text = url
url, url_text = map( string.strip, [ url, url_text ] )

return string.join( (
html_format.link_format[ 0 ],
url,
html_format.link_format[ 1 ],
url_text,
html_format.link_format[ 2 ] ), '' )

# evaluate some formatting in the text to legal code
def make_html( text ):
# order matters, - conversion to links has to be come first
text = re.sub( link_pattern, make_link, text )
text = re.sub( r'<<(\S+)\s(.*?)>>', r'<\1>\2</\1>', text )
text = re.sub( r'(?i)<PROG>(.*?)</PROG>', r'<EM>\1</EM>', text )
text = re.sub( r'(?i)<FILE>(.*?)</FILE>', r'<EM>\1</EM>', text )
text = re.sub( r'(?i)<OPT>(.*?)</OPT>', r'<STRONG>\1</STRONG>', text )
return text

-----8<---------------------------------------------------------------

Now the question: Which is the best way to enable parsing of recursive
parsing as mentioned in the example above?

So far I have thought of two ways. One may be to extend the regular
expression(s), but this is already cumbersome to read. The other
possibility would be to scan the string and replace <<...>> occurences
which don't contain <<, perhaps multiple times, until all patterns are
substituted.

I hope there is an easy way that I simply have overlooked 8-) .
Any suggestions are appreciated. Thank you in advance :) .

Stefan

Search Discussions

  • Felix Thibault at Mar 7, 2000 at 12:45 am

    At 00:03 3/4/00 +0100, Stefan Schwarzer wrote:
    Hello :-)

    Some time ago I have written one of the zillion programs that read some
    kind of format file(s) and make HTML from them. The program is able to
    convert <<I italic text>> to <I>italic text</I> or <<LINK link;text>> to
    <A HREF="link">text</A>.

    However, currently I can't convert <<LINK link;<<I italic text>>>> to
    <A HREF="link"><I>italic text</I></A>. The relevant code is

    -----8<---------------------------------------------------------------

    ######################################################################
    # perform substitutions
    # <<link url;url_text>>, url_text defaults to url
    link_pattern = re.compile( '(?si)<<link (.+?)(?:;(.*?))?>>' )

    def make_link( matchobj ):

    url, url_text = matchobj.groups()
    if not url_text: # use url as url_text by default
    url_text = url
    url, url_text = map( string.strip, [ url, url_text ] )

    return string.join( (
    html_format.link_format[ 0 ],
    url,
    html_format.link_format[ 1 ],
    url_text,
    html_format.link_format[ 2 ] ), '' )

    # evaluate some formatting in the text to legal code
    def make_html( text ):
    # order matters, - conversion to links has to be come first
    text = re.sub( link_pattern, make_link, text )
    text = re.sub( r'<<(\S+)\s(.*?)>>', r'<\1>\2</\1>', text )
    text = re.sub( r'(?i)<PROG>(.*?)</PROG>', r'<EM>\1</EM>', text )
    text = re.sub( r'(?i)<FILE>(.*?)</FILE>', r'<EM>\1</EM>', text )
    text = re.sub( r'(?i)<OPT>(.*?)</OPT>', r'<STRONG>\1</STRONG>', text )
    return text

    -----8<---------------------------------------------------------------

    Now the question: Which is the best way to enable parsing of recursive
    parsing as mentioned in the example above?

    So far I have thought of two ways. One may be to extend the regular
    expression(s), but this is already cumbersome to read. The other
    possibility would be to scan the string and replace <<...>> occurences
    which don't contain <<, perhaps multiple times, until all patterns are
    substituted.
    A while back I had read that you can't do arbitrarily deeply nested
    stuff with just a re, and I came up with this (re-less) way:

    -find where every opening and closing delimiter is in the text:

    #a func like this is in TextTools, but I couldn't find it in the #standard
    library...

    def findall(text, substring):
    """findall(text, substring) -> slices of text substring occurs in.
    """
    end, slices, size = 0, [], len(substring)
    for piece in string.split(text, substring)[:-1]:
    start = end+len(piece)
    end = start+size
    slices.append((start, end))
    return slices

    openslices, closeslices = findall(text, '<<'), findall(text, '>>')

    -merge these two lists, and make a dictionary so you can tell if an
    index means open a new tag or close an old one

    def merge_n_sort(*klvtupes):
    """merge_n_sort(*klvtupes) -> sortedlist, dict

    Take in a set of 2-ples made of a list of slices and
    a delimiter and return a sorted list of all left
    indices and a {left: delimiter} dictionary."""
    indexdict = {}
    for keylist, delimiter in klvtupes:
    for left, right in keylist:
    indexdict[left] = delimiter
    indices = indexdict.keys()
    indices.sort()
    return indices, indexdict

    indices, indexdict = merge_n_sort((openslices, '<<'),
    (closeslices, '>>'))

    -Then you just do something like:

    stack, out = (), []
    for i in range(len(indices)):

    if indexdict[indices[i]] == '<<':
    ...handle opening the tag and
    stack = tagname, stack #push...

    else: #indexdict[indices[i]] == '>>':
    try:
    tagname, stack = stack #...pop
    ...handle closing the tag
    except ValueError: #eg: (mismatched)parens)
    pass
    return string.join(out)

    (nb: you have to re-write findall if you have escaped stuff
    this is not-very-tested code.)
    I hope there is an easy way that I simply have overlooked 8-) .
    Any suggestions are appreciated. Thank you in advance :) .

    Stefan
    --
    http://www.python.org/mailman/listinfo/python-list
    hopefully-helpfully y'rs,
    Felix

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMar 3, '00 at 11:03p
activeMar 7, '00 at 12:45a
posts2
users2
websitepython.org

People

Translate

site design / logo © 2022 Grokbase