FAQ
Hello,

I am new to Python and as a first project decided to try to parse an XML
report using Python. I have the following, which works to extract one
element. I am stuck, however, at one element. I want to extract several
differenct elements per line, creating a comma separated variable (CSV)
line that can be imported to a spreadsheet. Not all elements are in each
line or part of the XML document - so if an element is not in a line, I
would leave a blank (2 commas). I can probably figure that out - it's the
extracting multiple elements and putting them in one line that has me
stumped. Help would be greatly appreciated. Thank you. What I have so
far (and I would like to stick to the DOM model):

import xml.dom.minidom
import sys


datasource=open(sys.argv[1])
domDatasource=xml.dom.minidom.parse(datasource)

def getText(nodelist):
rc=""
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc=rc+node.data
return rc

def HandleStatus(Finding):
for Status in Finding:
print getText(Status.childNodes)

HandleStatus (domDatasource.getElementsByTagName("FINDING_STATUS"))

domDatasource.unlink()

An excerpt of the xml file:



</SCRIPT_RESULTS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI134-Allow
paste operations via
scripts-Restric</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006310</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI135-Scripting
of Java applets -
Restricted</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006311</FINDING_ID><FINDING_STATUS>O</FINDING_STATUS><FINDING_DETAILS
OVERRIDE="O">The value:
Software\Policies\Microsoft\Windows\CurrentVersion\Internet
Settings\Zones\4\1A00 does not exist.

</FINDING_DETAILS><SCRIPT_RESULTS>The value:
Software\Policies\Microsoft\Windows\CurrentVersion\Internet
Settings\Zones\4\1A00 does not exist.

</SCRIPT_RESULTS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI136-User
Authentication - Logon -
Restricted</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006312</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI150-Microsoft
Java VM is
installed</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006313</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI151-Cipher
setting for DES 56/56 not
set</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006314</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI152-Cipher
setting for Null is not
set</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006315</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI153-Cipher
setting for Triple DES is not
set</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006316</FINDING_ID><FINDING_STATUS>O</FINDING_STATUS><FINDING_DETAILS
OVERRIDE="O">The value:
SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Hashes\SHA\Enabled
does not exist.

</FINDING_DETAILS><SCRIPT_RESULTS>The value:
SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Hashes\SHA\Enabled
does not exist.

</SCRIPT_RESULTS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI160-Hash
setting for SHA is not set
properly</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006317</FINDING_ID><FINDING_STATUS>NF</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBG007-IE
is not capable to use 128-bit
encryptio</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006318</FINDING_ID><FINDING_STATUS>O</FINDING_STATUS><FINDING_DETAILS
OVERRIDE="O">The key:
SOFTWARE\Microsoft\SystemCertificates\Root\Certificates\10F193F340AC91D6DE5F1EDC006247C4F25D9671
does not exist.

</FINDING_DETAILS><SCRIPT_RESULTS>The key:
SOFTWARE\Microsoft\SystemCertificates\Root\Certificates\10F193F340AC91D6DE5F1EDC006247C4F25D9671
does not exist.

</SCRIPT_RESULTS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBG010-DoD
Root Certificate is not
installed</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0006319</FINDING_ID><FINDING_STATUS>NA</FINDING_STATUS><TOOL>GD</TOOL><TOOL_VERSION>2.0.8.8</TOOL_VERSION><AUTHENTICATED_FINDING>TRUE</AUTHENTICATED_FINDING><GD_VUL_NAME>DTBI140-Error
Reporting tool is installed or
enabl</GD_VUL_NAME><GD_SEVERITY>2</GD_SEVERITY></FINDING><FINDING><FINDING_ID
TYPE="VK">V0007006</FINDING_ID><FINDING_STATUS>O</FINDING_STATUS><FINDING_DETAILS
OVERRIDE="O">The value: Software\Microsoft\Internet
Explorer\Main\AutoSearch does not exist.

</FINDING_DETAILS><SCRIPT_RESULTS>The value: Software\Microsoft\Internet
Explorer\Main\AutoSearch does not exist.

Search Discussions

  • Stefan Behnel at Mar 12, 2009 at 7:47 pm

    marc at marcd.org wrote:
    I am new to Python and as a first project decided to try to parse an XML
    report using Python. I have the following, which works to extract one
    element. I am stuck, however, at one element. I want to extract several
    differenct elements per line, creating a comma separated variable (CSV)
    line that can be imported to a spreadsheet. Not all elements are in each
    line or part of the XML document - so if an element is not in a line, I
    would leave a blank (2 commas). I can probably figure that out - it's the
    extracting multiple elements and putting them in one line that has me
    stumped. Help would be greatly appreciated. Thank you. What I have so
    far (and I would like to stick to the DOM model):
    There is another "DOM Model" in the stdlib. It's called ElementTree and is
    generally a lot easier to use. For example, to find the text content of an
    element called "element_that_has_text_content" in a subtree below
    "some_element", you can do

    print some_element.findtext(".//element_that_has_text_content")

    Stefan
  • Dave Kuhlman at Mar 12, 2009 at 8:14 pm

    On Thu, Mar 12, 2009 at 08:47:24PM +0100, Stefan Behnel wrote:
    marc at marcd.org wrote: [snip]
    There is another "DOM Model" in the stdlib. It's called ElementTree and is
    generally a lot easier to use. For example, to find the text content of an
    element called "element_that_has_text_content" in a subtree below
    "some_element", you can do

    print some_element.findtext(".//element_that_has_text_content")
    And, if you install lxml, then you will be able to use XPath, which
    is more powerful that the findtext() in ElementTree.

    Stefan did not tell you about that because he is a developer who
    has helped give us lxml, and perhaps he is a bit modest.

    There is a bit to learn in order to use the XPath capability in
    lxml. But, if you are doing any amount of XML processing in
    Python, it's likely to be worth it.

    You can learn about lxml here: http://codespeak.net/lxml/

    - Dave
  • Moos Heintzen at Mar 13, 2009 at 12:54 am
    So you want one line for each <finding> element? Easy:

    # Get <finding> elements
    findings = domDatasource.getElementsByTagName('FINDING')

    # Get the text of all direct child nodes in each element
    # That's assuming every <finding> child has a TEXT_NODE node.
    lines = []
    for finding in findings:
    lines.append([f.firstChild.data for f in finding.childNodes])

    # print
    for line in lines:
    print ", ".join(line)

    Not sure how you want to deal with newlines. You can escape them to \n
    in the output, or you might find something in the CSV module. (I
    haven't looked at it.)

    Now this doesn't deal with missing elements. I found some have 7, and
    others have 9. You might be able to insert two empty elements in lines
    with length 7.

    Or, if you want to have more control, you can make a dictionary with
    keys of all available tag names, and for each element found in
    <finding>, insert it in the dictionary (If it's a valid tag name).

    Then you have a list of dictionaries, and you can print the elements
    in any order you want. Missing elements will have null strings as
    values.

    Moos
  • Moos Heintzen at Mar 13, 2009 at 1:40 am
    I'm a little bored, so I wrote a function that gets <finding> elements
    and puts them in a dictionary. Missing elements are just an empty
    string.

    http://gist.github.com/78385

    Usage:
    d = process_finding(findings[0])
    ", ".join(map(lambda e: d[e], elements))
    u'V0006310, NF, , , GD, 2.0.8.8, TRUE, DTBI135-Scripting\nof Java
    applets -\nRestricted, 2'

    Now for a <finding> of 9 elements:
    d = process_finding(findings[1])
    ", ".join(map(lambda e: d[e], elements))
    u'V0006311, O, The
    value:\nSoftware\\Policies\\Microsoft\\Windows\\CurrentVersion\\Internet\nSettings\\Zones\\4\\1A00
    does not exist.\n\n, The
    value:\nSoftware\\Policies\\Microsoft\\Windows\\CurrentVersion\\Internet\nSettings\\Zones\\4\\1A00
    does not exist.\n\n, GD, 2.0.8.8, TRUE, DTBI136-User\nAuthentication -
    Logon -\nRestricted, 2'

    The map() function just applies the dictionary to each element in the
    elements list. You can reorder them anyway you want.

    You're welcome :)

    Moos

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouptutor @
categoriespython
postedMar 10, '09 at 11:02p
activeMar 13, '09 at 1:40a
posts5
users4
websitepython.org

People

Translate

site design / logo © 2023 Grokbase