FAQ
HTML comments aren't supposed to be nested, nor are they supposed to enclose
unescaped HTML tags, but people routinely commit both sins anyway. People
also forget to close HTML comments, but for the most part, browsers still
seem to display such pages more-or-less correctly.

I have an HTML comment stripping function which handles the nesting part
okay:

def zapcomment(data):
data = re.split("(<!--|-->)", data)
nest = 0
newdata = []
for i in range(len(data)):
if data[i] == "<!--":
nest += 1
elif data[i] == "-->":
nest = max(0, nest-1)
elif nest == 0:
newdata.append(data[i])
return "".join(newdata)

but I'm sort of at a loss how to handle the case of runaway comments, e.g.:

<script language="JavaScript" type="text/javascript">
<!--
<!-- Hide script from old browsers
myPix1 = new Array("gp1/gp1-pic1.gif","gp1/gp1-pic2.gif","gp1/gp1-pic3.gif","gp1/gp1-pic4.gif")
myPix2 = new Array("gp2/gp2-pic1.gif","gp2/gp2-pic2.gif","gp2/gp2-pic3.gif","gp2/gp2-pic4.gif")
myPix3 = new Array("gp3/gp3-pic1.gif","gp3/gp3-pic2.gif","gp3/gp3-pic3.gif","gp3/gp3-pic4.gif")
myPix4 = new Array("gp4/gp4-pic1.gif","gp4/gp4-pic2.gif","gp4/gp4-pic3.gif","gp4/gp4-pic4.gif")
function choosePix() {
if (document.images) {
randomNum = Math.floor((Math.random() * myPix1.length))
document.myPicture1.src = myPix1[randomNum]

randomNum = Math.floor((Math.random() * myPix2.length))
document.myPicture2.src = myPix2[randomNum]

randomNum = Math.floor((Math.random() * myPix3.length))
document.myPicture3.src = myPix3[randomNum]

randomNum = Math.floor((Math.random() * myPix4.length))
document.myPicture4.src = myPix4[randomNum]
}
}
// End hiding script from old browsers -->
</script>

Anybody out there got a bit of code which implements a useful heuristic for
that case? Ideally, stripping comments from the above would yield

<script language="JavaScript" type="text/javascript">
</script>

Thanks,

--
Skip Montanaro - skip at pobox.com
http://www.mojam.com/
http://www.musi-cal.com/

Search Discussions

  • Dennis Reinhardt at Nov 8, 2002 at 5:36 pm

    Anybody out there got a bit of code which implements a useful heuristic for
    that case? Ideally, stripping comments from the above would yield
    I don't have code, but I do have a heuristic. Rather than nest, I would
    scan for the next token in the input stream using a two state algorithm:

    Step 1: if <!-- is earliest in input, emit unprocessed text to left of <!--
    and go to step 2
    if --> is earliest in input, delete unprocessed text and remain
    in step 1
    if end of file is earliest in input, emit remaining unprocessed
    text and exit

    Step 2: if <!-- is earliest in input, delete unprocessed text including <!--
    and remain in step 2
    if --> is earliest in input, delete unprocessed text
    including --> and go to step 1
    if end of file is earliest in input, delete remaining unprocessed
    text and exit

    hth
    --

    Dennis Reinhardt

    http://www.dair.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 8, '02 at 11:00a
activeNov 8, '02 at 5:36p
posts2
users2
websitepython.org

People

Translate

site design / logo © 2022 Grokbase