FAQ
Hello,
I have a question please

How I can read a file in a UDF in pig

ex: A = load 'xmlFiles' using myXMLParser ( xmlfile)

can I do something like that, so that I can parse the xml file using some
java library

thanks for your help

Baraa
--

Search Discussions

  • William Dowling at Sep 14, 2011 at 2:53 pm
    I do this:
    define analyze_unif `analyze_unif_recs.py`
    input (stdin)
    output (stdout USING PigStreaming(','))
    ship ('$scriptDir/analyze_unif_recs.py');

    UnifLines = load '$unif_xml'
    using org.apache.pig.piggybank.storage.XMLLoader('REC')
    as (doc:chararray);
    UnifXmlByDocId = stream UnifLines through analyze_unif
    as (docid : int,
    xml_comp: chararray
    );

    where analyze_unif_recs.py is a python script I wrote that does the xml parsing, and org.apache.pig.piggybank.storage.XMLLoader('REC') finds the <REC> elements in the xml input, that are passed to my script.


    William F Dowling
    Sr Technical Specialist, Software Engineering
    Thomson Reuters
    0 +1 215 823 3853


    -----Original Message-----
    From: Baraa Mohamad
    Sent: Wednesday, September 14, 2011 10:41 AM
    To: user@pig.apache.org
    Subject: reading xml file within a UDF

    Hello,
    I have a question please

    How I can read a file in a UDF in pig

    ex: A = load 'xmlFiles' using myXMLParser ( xmlfile)

    can I do something like that, so that I can parse the xml file using some
    java library

    thanks for your help

    Baraa
    --
  • Baraa Mohamad at Sep 14, 2011 at 3:26 pm
    thank you for your reply,
    so can I do the same with java scripts,
    and to be more clear, I have a folder with multiple xml files thatI want to
    read and parse in order to extract some attributes (att1,att2) values ....

    ex
    < elem att1=452 att2=7587>elem1</elem>

    thanks
    On Wed, Sep 14, 2011 at 4:53 PM, wrote:

    I do this:
    define analyze_unif `analyze_unif_recs.py`
    input (stdin)
    output (stdout USING PigStreaming(','))
    ship ('$scriptDir/analyze_unif_recs.py');

    UnifLines = load '$unif_xml'
    using org.apache.pig.piggybank.storage.XMLLoader('REC')
    as (doc:chararray);
    UnifXmlByDocId = stream UnifLines through analyze_unif
    as (docid : int,
    xml_comp: chararray
    );

    where analyze_unif_recs.py is a python script I wrote that does the xml
    parsing, and org.apache.pig.piggybank.storage.XMLLoader('REC') finds the
    <REC> elements in the xml input, that are passed to my script.


    William F Dowling
    Sr Technical Specialist, Software Engineering
    Thomson Reuters
    0 +1 215 823 3853


    -----Original Message-----
    From: Baraa Mohamad
    Sent: Wednesday, September 14, 2011 10:41 AM
    To: user@pig.apache.org
    Subject: reading xml file within a UDF

    Hello,
    I have a question please

    How I can read a file in a UDF in pig

    ex: A = load 'xmlFiles' using myXMLParser ( xmlfile)

    can I do something like that, so that I can parse the xml file using some
    java library

    thanks for your help

    Baraa
    --


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 14, '11 at 2:41p
activeSep 14, '11 at 3:26p
posts3
users2
websitepig.apache.org

2 users in discussion

Baraa Mohamad: 2 posts William Dowling: 1 post

People

Translate

site design / logo © 2022 Grokbase