FAQ
I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:


<Node Name="A">
    <Node Name="B">
       <Node Name="C">
         <Node Name="D">
           <Node Name="E">


When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't know how deep this can be. This is the code I have so far:


nodes = []


def parseChild(c):
     if c.tag == 'Node':
         if 'Name' in c.attrib:
             nodes.append(c.attrib['Name'])
         for c1 in c:
             parseChild(c1)
     else:
         for node in nodes:
             print node,
         print c.tag


for parent in tree.getiterator():
     for child in parent:
         for x in child:
             parseChild(x)


My problem is that I don't know when I'm done with a node and I should remove a level of nesting. I would think this is a fairly common situation, but I could not find any examples of parsing a file like this. Perhaps I'm going about it completely wrong.

Search Discussions

  • Chris Angelico at Nov 25, 2013 at 10:30 pm

    On Tue, Nov 26, 2013 at 9:22 AM, Larry.Martell at gmail.com wrote:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.


    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.


    ChrisA
  • Larry Martell at Nov 25, 2013 at 10:45 pm

    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:


    On Tue, Nov 26, 2013 at 9:22 AM, Larry.Martell at gmail.com

    wrote:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">


    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.

    Yes, there are corresponding </Node> tags. I just didn't show them.

    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.

    Not sure what you mean by malformed. I don't really care for Google
    Groups, but I've been using it to post to this any other groups for
    years (since rn and deja news went away) and no one ever said my posts
    were malformed. In any case, I did not know the group was available as
    a ML. I've subbed to that and will post that way.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20131125/37b70ad4/attachment-0001.html>
  • Chris Angelico at Nov 25, 2013 at 11:19 pm

    On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell wrote:
    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.
    Yes, there are corresponding </Node> tags. I just didn't show them.

    Good good, I just saw the "unbounded" in your subject line and got
    worried :) I'm pretty sure there's a way to parse that will preserve
    the current nesting information, but others can describe that better
    than I can.

    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years (since rn
    and deja news went away) and no one ever said my posts were malformed. In
    any case, I did not know the group was available as a ML. I've subbed to
    that and will post that way.

    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware
    of this, and like you are not maliciously inflicting that on us all,
    but that doesn't make it any less painful to read :) Thanks for
    switching.


    ChrisA
  • Larry Martell at Nov 25, 2013 at 11:25 pm

    On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico wrote:

    On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell wrote:
    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.
    Yes, there are corresponding </Node> tags. I just didn't show them.
    Good good, I just saw the "unbounded" in your subject line and got
    worried :) I'm pretty sure there's a way to parse that will preserve
    the current nesting information, but others can describe that better
    than I can.

    The term 'unbounded' is used in the XML xsd file like this:


    <xs:sequence maxOccurs="unbounded">



    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years (since rn
    and deja news went away) and no one ever said my posts were malformed. In
    any case, I did not know the group was available as a ML. I've subbed to
    that and will post that way.
    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware
    of this, and like you are not maliciously inflicting that on us all,
    but that doesn't make it any less painful to read :) Thanks for
    switching.
    I had noticed the double spacing and I always fixed that when I replied.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20131125/1358e671/attachment.html>
  • Rusi at Nov 28, 2013 at 1:31 pm

    On Tuesday, November 26, 2013 4:55:55 AM UTC+5:30, Larry.... at gmail.com wrote:
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years (since rn
    and deja news went away) and no one ever said my posts were malformed. In
    any case, I did not know the group was available as a ML. I've subbed to
    that and will post that way.
    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware
    of this, and like you are not maliciously inflicting that on us all,
    but that doesn't make it any less painful to read :) Thanks for
    switching.
    I had noticed the double spacing and I always fixed that when I replied.

    Here's what I do to manage the GG-headaches:


    1. Firefox needs to have the "Its all text" addon installed
    https://addons.mozilla.org/en-US/firefox/addon/its-all-text/


    2. Set the editor in "Its all text" to emacs
        [You can use anything? including pure python? more on that below]


    3. Put the following into your emacs init
    -----------------
    ;; Clean up Google Groups extra newlines containing only "> "


    (defun clean-gg ()
       (interactive)
       (replace-regexp "^> *\n> *\n> *$" "-=\=-" nil 0 (point-max))
       (flush-lines "> *$" 0 (point-max))
       (replace-regexp "-=\=-" "" nil 0 (point-max))
    ; (save-buffers-kill-terminal t)
    )




    (global-set-key (kbd "<f9>") 'clean-gg)


    ;(push 'clean-gg find-file-hook)
    ----------------


    Now firefox will show a small new "edit" edit button in the text window.
    Clicking that puts you into emacs with the text of the message.


    Now F9 will cleanup the double-spaces.


    Now, depending on whether you are comfortable with emacs or not you
    can do either of:


    1. Continue editing in emacs.
        M-q and/or auto-fill-mode will clean up long-line paras
        Save-quit will put you back into firefox with cleaned up text


    2. Not comfortable with emacs? Just F9 and save-quit will get you back
        to emacs with cleaned up double-spaced text.
        The long lines problem remains in this case


    Dont like emacs?


    1. If you know how to write similar code for vi (or whatever) you are
        set.
    2. You can also setup emacs to cleanup and close immediately
    3. You can also setup your 'editor' to be a pure python script
        [Ive not got round to doing it because I'm not sure how to
        catch-report errors in a proper cross-platform way.]
    4. If you are a javascript/greasemonkey expert I guess you can convert
        the emacs-code to JS/GM code and that would be a zero-click
        solution.


    Usually use emacs?
    You may prefer emacsclient to emacs for the editor
  • Stefan Behnel at Nov 26, 2013 at 7:38 am

    Larry.Martell... at gmail.com, 25.11.2013 23:22:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't know how deep this can be. This is the code I have so far:

    nodes = []

    def parseChild(c):
    if c.tag == 'Node':
    if 'Name' in c.attrib:
    nodes.append(c.attrib['Name'])
    for c1 in c:
    parseChild(c1)
    else:
    for node in nodes:
    print node,
    print c.tag

    for parent in tree.getiterator():
    for child in parent:
    for x in child:
    parseChild(x)

    This seems hugely redundant. tree.getiterator() already returns a recursive
    iterable, and then, for each nodes in your document, you are running
    recursively over its entire subtree. Meaning that you'll visit each node as
    many times as its depth in the tree.



    My problem is that I don't know when I'm done with a node and I should
    remove a level of nesting. I would think this is a fairly common
    situation, but I could not find any examples of parsing a file like
    this. Perhaps I'm going about it completely wrong.

    Your recursive traversal function tells you when you're done. If you drop
    the getiterator() bit, reaching the end of parseChild() means that you're
    done with the element and start backing up. So you can simply pass down a
    list of element names that you append() at the beginning of the function
    and pop() at the end, i.e. a stack. That list will then always give you the
    current path from the root node.


    Alternatively, if you want to use lxml.etree instead of ElementTree, you
    can use it's iterwalk() function, which gives you the same thing but
    without recursion, as a plain iterator.


    http://lxml.de/parsing.html#iterparse-and-iterwalk


    Stefan
  • Larry Martell at Nov 26, 2013 at 12:23 pm

    On Tue, Nov 26, 2013 at 2:38 AM, Stefan Behnel wrote:
    Larry.Martell... at gmail.com, 25.11.2013 23:22:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't know how deep this can be. This is the code I have so far:

    nodes = []

    def parseChild(c):
    if c.tag == 'Node':
    if 'Name' in c.attrib:
    nodes.append(c.attrib['Name'])
    for c1 in c:
    parseChild(c1)
    else:
    for node in nodes:
    print node,
    print c.tag

    for parent in tree.getiterator():
    for child in parent:
    for x in child:
    parseChild(x)
    This seems hugely redundant. tree.getiterator() already returns a recursive
    iterable, and then, for each nodes in your document, you are running
    recursively over its entire subtree. Meaning that you'll visit each node as
    many times as its depth in the tree.

    My problem is that I don't know when I'm done with a node and I should
    remove a level of nesting. I would think this is a fairly common
    situation, but I could not find any examples of parsing a file like
    this. Perhaps I'm going about it completely wrong.
    Your recursive traversal function tells you when you're done. If you drop
    the getiterator() bit, reaching the end of parseChild() means that you're
    done with the element and start backing up. So you can simply pass down a
    list of element names that you append() at the beginning of the function
    and pop() at the end, i.e. a stack. That list will then always give you the
    current path from the root node.

    Thanks for the reply. How can I remove getiterator()? Then I won't be
    traversing the nodes of the tree. I can't iterate over tree. I am also
    unclear on where to do the pop(). I tried putting it just after the
    recursive call to parseChild() and I tried putting as the very last
    statement in parseChild() - neither one gave the desired result. Can
    you show me in code what you mean?


    Thanks!
    -larry

    Alternatively, if you want to use lxml.etree instead of ElementTree, you
    can use it's iterwalk() function, which gives you the same thing but
    without recursion, as a plain iterator.

    http://lxml.de/parsing.html#iterparse-and-iterwalk
  • Stefan Behnel at Nov 26, 2013 at 1:20 pm

    Larry Martell, 26.11.2013 13:23:
    On Tue, Nov 26, 2013 at 2:38 AM, Stefan Behnel wrote:
    Larry.Martell... at gmail.com, 25.11.2013 23:22:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't know how deep this can be. This is the code I have so far:

    nodes = []

    def parseChild(c):
    if c.tag == 'Node':
    if 'Name' in c.attrib:
    nodes.append(c.attrib['Name'])
    for c1 in c:
    parseChild(c1)
    else:
    for node in nodes:
    print node,
    print c.tag

    for parent in tree.getiterator():
    for child in parent:
    for x in child:
    parseChild(x)
    This seems hugely redundant. tree.getiterator() already returns a recursive
    iterable, and then, for each nodes in your document, you are running
    recursively over its entire subtree. Meaning that you'll visit each node as
    many times as its depth in the tree.

    My problem is that I don't know when I'm done with a node and I should
    remove a level of nesting. I would think this is a fairly common
    situation, but I could not find any examples of parsing a file like
    this. Perhaps I'm going about it completely wrong.
    Your recursive traversal function tells you when you're done. If you drop
    the getiterator() bit, reaching the end of parseChild() means that you're
    done with the element and start backing up. So you can simply pass down a
    list of element names that you append() at the beginning of the function
    and pop() at the end, i.e. a stack. That list will then always give you the
    current path from the root node.
    Thanks for the reply. How can I remove getiterator()? Then I won't be
    traversing the nodes of the tree. I can't iterate over tree. I am also
    unclear on where to do the pop(). I tried putting it just after the
    recursive call to parseChild() and I tried putting as the very last
    statement in parseChild() - neither one gave the desired result. Can
    you show me in code what you mean?

    untested:


       nodes = []


       def process_subtree(c, path):
           name = c.get('Name') if c.tag == 'Node' else None
           if name:
               path.append(name)
               nodes.append('/'.join(path))


           for c1 in c:
               process_subtree(c1, path)


           if name:
               path.pop()


       process_subtree(tree.getroot(), [])




    Stefan
  • Larry Martell at Nov 27, 2013 at 2:58 pm

    On Tue, Nov 26, 2013 at 8:20 AM, Stefan Behnel wrote:
    Larry Martell, 26.11.2013 13:23:
    On Tue, Nov 26, 2013 at 2:38 AM, Stefan Behnel wrote:
    Larry.Martell... at gmail.com, 25.11.2013 23:22:
    I have an XML file that has an element called "Node". These can be nested to any depth and the depth of the nesting is not known to me. I need to parse the file and preserve the nesting. For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't know how deep this can be. This is the code I have so far:

    nodes = []

    def parseChild(c):
    if c.tag == 'Node':
    if 'Name' in c.attrib:
    nodes.append(c.attrib['Name'])
    for c1 in c:
    parseChild(c1)
    else:
    for node in nodes:
    print node,
    print c.tag

    for parent in tree.getiterator():
    for child in parent:
    for x in child:
    parseChild(x)
    This seems hugely redundant. tree.getiterator() already returns a recursive
    iterable, and then, for each nodes in your document, you are running
    recursively over its entire subtree. Meaning that you'll visit each node as
    many times as its depth in the tree.

    My problem is that I don't know when I'm done with a node and I should
    remove a level of nesting. I would think this is a fairly common
    situation, but I could not find any examples of parsing a file like
    this. Perhaps I'm going about it completely wrong.
    Your recursive traversal function tells you when you're done. If you drop
    the getiterator() bit, reaching the end of parseChild() means that you're
    done with the element and start backing up. So you can simply pass down a
    list of element names that you append() at the beginning of the function
    and pop() at the end, i.e. a stack. That list will then always give you the
    current path from the root node.
    Thanks for the reply. How can I remove getiterator()? Then I won't be
    traversing the nodes of the tree. I can't iterate over tree. I am also
    unclear on where to do the pop(). I tried putting it just after the
    recursive call to parseChild() and I tried putting as the very last
    statement in parseChild() - neither one gave the desired result. Can
    you show me in code what you mean?
    untested:

    nodes = []

    def process_subtree(c, path):
    name = c.get('Name') if c.tag == 'Node' else None
    if name:
    path.append(name)
    nodes.append('/'.join(path))

    for c1 in c:
    process_subtree(c1, path)

    if name:
    path.pop()

    process_subtree(tree.getroot(), [])

    Thanks! This was extremely helpful and I've use these concepts to
    write script that successfully parses my file.
  • Alister at Nov 26, 2013 at 10:41 am

    On Mon, 25 Nov 2013 18:25:55 -0500, Larry Martell wrote:

    On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico wrote:

    On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell
    <larry.martell@gmail.com>
    wrote:
    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.
    Yes, there are corresponding </Node> tags. I just didn't show them.
    Good good, I just saw the "unbounded" in your subject line and got
    worried :) I'm pretty sure there's a way to parse that will preserve
    the current nesting information, but others can describe that better
    than I can.
    The term 'unbounded' is used in the XML xsd file like this:

    <xs:sequence maxOccurs="unbounded">

    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years
    (since rn
    and deja news went away) and no one ever said my posts were
    malformed. In any case, I did not know the group was available as a
    ML. I've subbed to that and will post that way.
    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware of
    this, and like you are not maliciously inflicting that on us all, but
    that doesn't make it any less painful to read :) Thanks for switching.
    I had noticed the double spacing and I always fixed that when I replied.
    <div dir="ltr">On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico <span
    dir="ltr">&lt;<a href="mailto:rosuav at gmail.com"
    target="_blank">rosuav at gmail.com</a>&gt;</span> wrote:<br><div
    class="gmail_extra"><div class="gmail_quote">
    <blockquote class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex"><div
    class="im">On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell &lt;<a
    href="mailto:larry.martell at gmail.com">larry.martell at gmail.com</a>&gt;
    wrote:<br>

    &gt; On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico
    wrote:<br>
    &gt;<br>
    </div><div class="im">&gt;&gt; First off, please clarify: Are there five
    corresponding &lt;/Node&gt; tags<br>
    &gt;&gt; later on? If not, it&#39;s not XML, and nesting will have to be
    defined<br>
    &gt;&gt; some other way.<br>
    &gt;<br>
    &gt; Yes, there are corresponding &lt;/Node&gt; tags. I just didn&#39;t
    show them.<br>
    <br>
    </div>Good good, I just saw the &quot;unbounded&quot; in your subject
    line and got<br>
    worried :) I&#39;m pretty sure there&#39;s a way to parse that will
    preserve<br>
    the current nesting information, but others can describe that better<br>
    than I can.<br></blockquote><div><br></div><div>The term
    &#39;unbounded&#39; is used in the XML xsd file like
    this:</div><div><br></div><div>&lt;xs:sequence
    maxOccurs=&quot;unbounded&quot;&gt;<br></div><div><br></div><blockquote
    class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex">
    <div class="im"><br>
    &gt;&gt; Secondly, please get off Google Groups. Your initial post
    is<br>
    &gt;&gt; malformed, and unless you specifically fight the software,
    your<br>
    &gt;&gt; replies will be even more malformed, to the point of being
    quite<br>
    &gt;&gt; annoying. There are many other ways to read a newsgroup, or you
    can<br>
    &gt;&gt; subscribe to the mailing list <a
    href="mailto:python-list at python.org">python-list at python.org</a>, which
    carries<br>
    &gt;&gt; the same content.<br>
    &gt;<br>
    &gt; Not sure what you mean by malformed. I don&#39;t really care for
    Google Groups,<br>
    &gt; but I&#39;ve been using it to post to this any other groups for
    years (since rn<br>
    &gt; and deja news went away) and no one ever said my posts were
    malformed. In<br>
    &gt; any case, I did not know the group was available as a ML. I&#39;ve
    subbed to<br>
    &gt; that and will post that way.<br>
    <br>
    </div>The mailing list works well for me too. Google Groups is
    deceptively<br>
    easy for a lot of people, but if you look through the list&#39;s
    archives,<br>
    you&#39;ll see that the posts it makes are unwrapped (and thus string
    out<br>
    to the right an arbitrary length), and all quoted text is<br>
    double-spaced, among other problems. Its users are generally unaware<br>
    of this, and like you are not maliciously inflicting that on us all,<br>
    but that doesn&#39;t make it any less painful to read :) Thanks for<br>
    switching.<br><span class=""><font
    color="#888888"><br></font></span></blockquote><div><br></div><div>I had
    noticed the double spacing and I always fixed that when I replied.
    ?</div></div><br></div></div>



    if you could now change your male client to send in plane text only we
    would not get this duplicated HTML copy of the post which is just as
    annoying as the double spacing form GG (probably more so).




    --
    <KnaraKat> Bite me.
    * TheOne gets some salt, then proceeds to nibble on KnaraKat a little
              bit....
  • Larry Martell at Nov 26, 2013 at 11:59 am

    On Tue, Nov 26, 2013 at 5:41 AM, Alister wrote:
    On Mon, 25 Nov 2013 18:25:55 -0500, Larry Martell wrote:

    On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico <rosuav@gmail.com>
    wrote:
    On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell
    <larry.martell@gmail.com>
    wrote:
    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.
    Yes, there are corresponding </Node> tags. I just didn't show them.
    Good good, I just saw the "unbounded" in your subject line and got
    worried :) I'm pretty sure there's a way to parse that will preserve
    the current nesting information, but others can describe that better
    than I can.
    The term 'unbounded' is used in the XML xsd file like this:

    <xs:sequence maxOccurs="unbounded">

    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years
    (since rn
    and deja news went away) and no one ever said my posts were
    malformed. In any case, I did not know the group was available as a
    ML. I've subbed to that and will post that way.
    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware of
    this, and like you are not maliciously inflicting that on us all, but
    that doesn't make it any less painful to read :) Thanks for switching.
    I had noticed the double spacing and I always fixed that when I replied.
    <div dir="ltr">On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico <span
    dir="ltr">&lt;<a href="mailto:rosuav at gmail.com"
    target="_blank">rosuav at gmail.com</a>&gt;</span> wrote:<br><div
    class="gmail_extra"><div class="gmail_quote">
    <blockquote class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex"><div
    class="im">On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell &lt;<a
    href="mailto:larry.martell at gmail.com">larry.martell at gmail.com</a>&gt;
    wrote:<br>

    &gt; On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico
    wrote:<br>
    &gt;<br>
    </div><div class="im">&gt;&gt; First off, please clarify: Are there five
    corresponding &lt;/Node&gt; tags<br>
    &gt;&gt; later on? If not, it&#39;s not XML, and nesting will have to be
    defined<br>
    &gt;&gt; some other way.<br>
    &gt;<br>
    &gt; Yes, there are corresponding &lt;/Node&gt; tags. I just didn&#39;t
    show them.<br>
    <br>
    </div>Good good, I just saw the &quot;unbounded&quot; in your subject
    line and got<br>
    worried :) I&#39;m pretty sure there&#39;s a way to parse that will
    preserve<br>
    the current nesting information, but others can describe that better<br>
    than I can.<br></blockquote><div><br></div><div>The term
    &#39;unbounded&#39; is used in the XML xsd file like
    this:</div><div><br></div><div>&lt;xs:sequence
    maxOccurs=&quot;unbounded&quot;&gt;<br></div><div><br></div><blockquote
    class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex">
    <div class="im"><br>
    &gt;&gt; Secondly, please get off Google Groups. Your initial post
    is<br>
    &gt;&gt; malformed, and unless you specifically fight the software,
    your<br>
    &gt;&gt; replies will be even more malformed, to the point of being
    quite<br>
    &gt;&gt; annoying. There are many other ways to read a newsgroup, or you
    can<br>
    &gt;&gt; subscribe to the mailing list <a
    href="mailto:python-list at python.org">python-list at python.org</a>, which
    carries<br>
    &gt;&gt; the same content.<br>
    &gt;<br>
    &gt; Not sure what you mean by malformed. I don&#39;t really care for
    Google Groups,<br>
    &gt; but I&#39;ve been using it to post to this any other groups for
    years (since rn<br>
    &gt; and deja news went away) and no one ever said my posts were
    malformed. In<br>
    &gt; any case, I did not know the group was available as a ML. I&#39;ve
    subbed to<br>
    &gt; that and will post that way.<br>
    <br>
    </div>The mailing list works well for me too. Google Groups is
    deceptively<br>
    easy for a lot of people, but if you look through the list&#39;s
    archives,<br>
    you&#39;ll see that the posts it makes are unwrapped (and thus string
    out<br>
    to the right an arbitrary length), and all quoted text is<br>
    double-spaced, among other problems. Its users are generally unaware<br>
    of this, and like you are not maliciously inflicting that on us all,<br>
    but that doesn&#39;t make it any less painful to read :) Thanks for<br>
    switching.<br><span class=""><font
    color="#888888"><br></font></span></blockquote><div><br></div><div>I had
    noticed the double spacing and I always fixed that when I replied.
    </div></div><br></div></div>

    if you could now change your male client



    What about my female client?

    to send in plane text

    How about plain text?

    only we
    would not get this duplicated HTML copy of the post which is just as
    annoying as the double spacing form GG (probably more so).

    Sorry, didn't realize it was sending in HMTL. I had it set to plain
    text, but when the awful gmail update came out it seems to have
    reverted to HTML. Hopefully this is better.
  • Chris Angelico at Nov 26, 2013 at 12:20 pm

    On Tue, Nov 26, 2013 at 10:59 PM, Larry Martell wrote:
    Sorry, didn't realize it was sending in HMTL. I had it set to plain
    text, but when the awful gmail update came out it seems to have
    reverted to HTML. Hopefully this is better.

    Yeah, I have the same trouble... but yes, this post looks fine to me.
    (Do consider trimming quoted text, though.) It's text with no HTML
    component.


    ChrisA
  • Alister at Nov 26, 2013 at 12:57 pm

    On 26/11/13 11:59, Larry Martell wrote:
    On Tue, Nov 26, 2013 at 5:41 AM, Alister wrote:
    On Mon, 25 Nov 2013 18:25:55 -0500, Larry Martell wrote:

    On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico <rosuav@gmail.com>
    wrote:
    On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell
    <larry.martell@gmail.com>
    wrote:
    On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico wrote:

    First off, please clarify: Are there five corresponding </Node> tags
    later on? If not, it's not XML, and nesting will have to be defined
    some other way.
    Yes, there are corresponding </Node> tags. I just didn't show them.
    Good good, I just saw the "unbounded" in your subject line and got
    worried :) I'm pretty sure there's a way to parse that will preserve
    the current nesting information, but others can describe that better
    than I can.
    The term 'unbounded' is used in the XML xsd file like this:

    <xs:sequence maxOccurs="unbounded">

    Secondly, please get off Google Groups. Your initial post is
    malformed, and unless you specifically fight the software, your
    replies will be even more malformed, to the point of being quite
    annoying. There are many other ways to read a newsgroup, or you can
    subscribe to the mailing list python-list at python.org, which carries
    the same content.
    Not sure what you mean by malformed. I don't really care for Google Groups,
    but I've been using it to post to this any other groups for years
    (since rn
    and deja news went away) and no one ever said my posts were
    malformed. In any case, I did not know the group was available as a
    ML. I've subbed to that and will post that way.
    The mailing list works well for me too. Google Groups is deceptively
    easy for a lot of people, but if you look through the list's archives,
    you'll see that the posts it makes are unwrapped (and thus string out
    to the right an arbitrary length), and all quoted text is
    double-spaced, among other problems. Its users are generally unaware of
    this, and like you are not maliciously inflicting that on us all, but
    that doesn't make it any less painful to read :) Thanks for switching.
    I had noticed the double spacing and I always fixed that when I replied.
    <div dir="ltr">On Mon, Nov 25, 2013 at 6:19 PM, Chris Angelico <span
    dir="ltr">&lt;<a href="mailto:rosuav at gmail.com"
    target="_blank">rosuav at gmail.com</a>&gt;</span> wrote:<br><div
    class="gmail_extra"><div class="gmail_quote">
    <blockquote class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex"><div
    class="im">On Tue, Nov 26, 2013 at 9:45 AM, Larry Martell &lt;<a
    href="mailto:larry.martell at gmail.com">larry.martell at gmail.com</a>&gt;
    wrote:<br>

    &gt; On Monday, November 25, 2013 5:30:44 PM UTC-5, Chris Angelico
    wrote:<br>
    &gt;<br>
    </div><div class="im">&gt;&gt; First off, please clarify: Are there five
    corresponding &lt;/Node&gt; tags<br>
    &gt;&gt; later on? If not, it&#39;s not XML, and nesting will have to be
    defined<br>
    &gt;&gt; some other way.<br>
    &gt;<br>
    &gt; Yes, there are corresponding &lt;/Node&gt; tags. I just didn&#39;t
    show them.<br>
    <br>
    </div>Good good, I just saw the &quot;unbounded&quot; in your subject
    line and got<br>
    worried :) I&#39;m pretty sure there&#39;s a way to parse that will
    preserve<br>
    the current nesting information, but others can describe that better<br>
    than I can.<br></blockquote><div><br></div><div>The term
    &#39;unbounded&#39; is used in the XML xsd file like
    this:</div><div><br></div><div>&lt;xs:sequence
    maxOccurs=&quot;unbounded&quot;&gt;<br></div><div><br></div><blockquote
    class="gmail_quote" style="margin:0px 0px 0px
    0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-
    left-style:solid;padding-left:1ex">
    <div class="im"><br>
    &gt;&gt; Secondly, please get off Google Groups. Your initial post
    is<br>
    &gt;&gt; malformed, and unless you specifically fight the software,
    your<br>
    &gt;&gt; replies will be even more malformed, to the point of being
    quite<br>
    &gt;&gt; annoying. There are many other ways to read a newsgroup, or you
    can<br>
    &gt;&gt; subscribe to the mailing list <a
    href="mailto:python-list at python.org">python-list at python.org</a>, which
    carries<br>
    &gt;&gt; the same content.<br>
    &gt;<br>
    &gt; Not sure what you mean by malformed. I don&#39;t really care for
    Google Groups,<br>
    &gt; but I&#39;ve been using it to post to this any other groups for
    years (since rn<br>
    &gt; and deja news went away) and no one ever said my posts were
    malformed. In<br>
    &gt; any case, I did not know the group was available as a ML. I&#39;ve
    subbed to<br>
    &gt; that and will post that way.<br>
    <br>
    </div>The mailing list works well for me too. Google Groups is
    deceptively<br>
    easy for a lot of people, but if you look through the list&#39;s
    archives,<br>
    you&#39;ll see that the posts it makes are unwrapped (and thus string
    out<br>
    to the right an arbitrary length), and all quoted text is<br>
    double-spaced, among other problems. Its users are generally unaware<br>
    of this, and like you are not maliciously inflicting that on us all,<br>
    but that doesn&#39;t make it any less painful to read :) Thanks for<br>
    switching.<br><span class=""><font
    color="#888888"><br></font></span></blockquote><div><br></div><div>I had
    noticed the double spacing and I always fixed that when I replied.
    </div></div><br></div></div>
    if you could now change your male client
    What about my female client?
    to send in plane text
    How about plain text?
    only we
    would not get this duplicated HTML copy of the post which is just as
    annoying as the double spacing form GG (probably more so).
    Sorry, didn't realize it was sending in HMTL. I had it set to plain
    text, but when the awful gmail update came out it seems to have
    reverted to HTML. Hopefully this is better.
    sorry, Typing too quickly without paying attention.
  • Neil Cerutti at Nov 26, 2013 at 3:27 pm

    On Mon, Nov 25, 2013 at 5:22 PM, Larry.Martell at gmail.com wrote:
    I have an XML file that has an element called "Node". These can
    be nested to any depth and the depth of the nesting is not
    known to me. I need to parse the file and preserve the nesting.
    For exmaple, if the XML file had:

    <Node Name="A">
    <Node Name="B">
    <Node Name="C">
    <Node Name="D">
    <Node Name="E">

    When I'm parsing Node "E" I need to know I'm in A/B/C/D/E.
    Problem is I don't know how deep this can be. This is the code
    I have so far:

    I also an ElementTree user, but it's fairly heavy-duty for simple
    jobs. I use sax for simple those. In fact, I'm kind of a saxophone.
    This is basically the same idea as others have posted.


    the_xml = """<?xml version="1.0" encoding="ISO-8859-1"?>
    <Node Name="A">
        <Node Name="B">
           <Node Name="C">
             <Node Name="D">
               <Node Name="E">
               </Node></Node></Node></Node></Node>"""
    import io
    import sys
    import xml.sax as sax




    class NodeHandler(sax.handler.ContentHandler):
         def startDocument(self):
             self.title = ''
             self.names = []


         def startElement(self, name, attrs):
             self.process(attrs['Name'])
             self.names.append(attrs['Name'])


         def process(self, name):
             print("Node {} Nest {}".format(name, '/'.join(self.names)))
             # Do your stuff.


         def endElement(self, name):
             self.names.pop()




    print(sys.version_info)
    handler = NodeHandler()
    parser = sax.parse(io.StringIO(the_xml), handler)


    Output:
    sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0)
    Node A Nest
    Node B Nest A
    Node C Nest A/B
    Node D Nest A/B/C
    Node E Nest A/B/C/D


    --
    Neil Cerutti

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 25, '13 at 10:22p
activeNov 28, '13 at 1:31p
posts15
users6
websitepython.org

People

Translate

site design / logo © 2022 Grokbase