FAQ
Hi - I'm hoping to get some guidance

I am trying to parse a structure that looks like:

{urn:FindingImport}TOOL - GD
{urn:FindingImport}TOOL_VERSION - 2.0.8.8
{urn:FindingImport}AUTHENTICATED_FINDING - TRUE
{urn:FindingImport}GD_VUL_NAME - Rename Built-in Guest Account
{urn:FindingImport}GD_SEVERITY - 2
{urn:FindingImport}FINDING - None
{urn:FindingImport}FINDING_ID - V0001115
{urn:FindingImport}FINDING_STATUS - NF
{urn:FindingImport}TOOL - GD
{urn:FindingImport}TOOL_VERSION - 2.0.8.8
{urn:FindingImport}AUTHENTICATED_FINDING - TRUE
{urn:FindingImport}GD_VUL_NAME - Rename Built-in Administrator Account
{urn:FindingImport}GD_SEVERITY - 2
{urn:FindingImport}FINDING - None
{urn:FindingImport}FINDING_ID - V0001117

This is the result when the original data is run through 'for element in
root.iter():' as described in the lxml tutorial. This structure repeats
many times in the document with different values after each tag. I want
to take the values and place them in one csv line for each structure in
the file. The closest I have come is something like (but doesn't work):

for element in root.iter("{urn:FindingImport}TOOL"):
print element.text
print element.getnext().text
print element.getnext().text

The initial print element.tag and the first element.getnext().text work as
I would like, but I am not finding a way to parse past that. The second
element.getnext().text returns the value for the same tag as the one prior
to it. I know I am missing something, but don't see it. Any assistance
is appreciated.

Thanks,

marc

Search Discussions

  • Stefan Behnel at Apr 6, 2009 at 6:49 am

    marc at marcd.org wrote:
    I am trying to parse a structure that looks like:

    {urn:FindingImport}TOOL - GD
    {urn:FindingImport}TOOL_VERSION - 2.0.8.8
    {urn:FindingImport}AUTHENTICATED_FINDING - TRUE
    {urn:FindingImport}GD_VUL_NAME - Rename Built-in Guest Account
    {urn:FindingImport}GD_SEVERITY - 2
    {urn:FindingImport}FINDING - None
    {urn:FindingImport}FINDING_ID - V0001115
    {urn:FindingImport}FINDING_STATUS - NF
    {urn:FindingImport}TOOL - GD
    {urn:FindingImport}TOOL_VERSION - 2.0.8.8
    {urn:FindingImport}AUTHENTICATED_FINDING - TRUE
    {urn:FindingImport}GD_VUL_NAME - Rename Built-in Administrator Account
    {urn:FindingImport}GD_SEVERITY - 2
    {urn:FindingImport}FINDING - None
    {urn:FindingImport}FINDING_ID - V0001117

    This is the result when the original data is run through 'for element in
    root.iter():' as described in the lxml tutorial.
    Note that this does not give you the "structure" (i.e. the hierarchy of
    elements) but only the plain elements in document order. XML is a tree
    structure that has elements at the same level and child-parent
    relationships between elements at different hierarchy levels.

    This structure repeats
    many times in the document with different values after each tag. I want
    to take the values and place them in one csv line for each structure in
    the file. The closest I have come is something like (but doesn't work):

    for element in root.iter("{urn:FindingImport}TOOL"):
    print element.text
    print element.getnext().text
    print element.getnext().text

    The initial print element.tag and the first element.getnext().text work as
    I would like, but I am not finding a way to parse past that. The second
    element.getnext().text returns the value for the same tag as the one prior
    to it.
    .getnext() returns the sibling of the element, not its child. I assume that
    "TOOL" is the top-level element of the repeating subtree that you want to
    extract here. In that case, you can use e.g.

    element.find("{urn:FindingImport}GD_VUL_NAME")

    to retrieve the subelement named 'GD_VUL_NAME', or

    element.findtext("{urn:FindingImport}GD_VUL_NAME")

    to retrieve its text content directly.

    You should also take a look at lxml.objectify, which provides a very handy
    way to deal with the kind of XML that you have here. It will allow you to
    do this:

    for tool in root.iter("{urn:FindingImport}TOOL"):
    print tool.GD_VUL_NAME, tool.FINDING

    BTW, if all you want is to map the XML to CSV, without any major
    restructuring in between, take a look at iterparse(). It works a lot like
    the .iter() method, but iterates during parsing, which allows you to delete
    subtrees after use to safe memory.

    Stefan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouptutor @
categoriespython
postedApr 4, '09 at 1:05a
activeApr 6, '09 at 6:49a
posts2
users2
websitepython.org

2 users in discussion

Stefan Behnel: 1 post Marc: 1 post

People

Translate

site design / logo © 2022 Grokbase