Grokbase Groups Hive user June 2010
FAQ
Hi
I have created a table in hive (Suppose table1 with two columns, col1 and
col2 )

now i have an xml file for which i have write a python script which read the
xml file and transform it in single row with tab seperated
e.g the output of python script can be

row 1 = val1 val2
row2 = val3 val4

so the output of file has straight rows with the help of python script. now
i want to load this into created table. I have seen the example of in which
the data is first loaded in u_data table then transform it using python
script in u_data_new but in m scenario. it does not fit as i have xml file
as source.


Kindly let me know can I achieve this??
Thanks

--
Regards
Shuja-ur-Rehman Baig
_________________________________
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445

Search Discussions

  • Ashish Thusoo at Jun 10, 2010 at 1:07 am
    You could load this whole xml file into a table with a single row and a single column. The default record delimiter is \n but you can create a table where the record delimiter is \001. Once you do that you can follow the approach that you described below. Will this solve your problem?

    Ashish

    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1 and col2 )

    now i have an xml file for which i have write a python script which read the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python script. now i want to load this into created table. I have seen the example of in which the data is first loaded in u_data table then transform it using python script in u_data_new but in m scenario. it does not fit as i have xml file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445
  • Shuja Rehman at Jun 10, 2010 at 11:38 am
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------

    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------

    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    *XML FILE*:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    *MAPPER*
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    *FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver*

    I am facing hard time. Any suggestions
    Thanks
    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo wrote:

    You could load this whole xml file into a table with a single row and a
    single column. The default record delimiter is \n but you can create a table
    where the record delimiter is \001. Once you do that you can follow the
    approach that you described below. Will this solve your problem?

    Ashish

    ------------------------------
    *From:* Shuja Rehman
    *Sent:* Wednesday, June 09, 2010 3:07 PM
    *To:* hive-user@hadoop.apache.org
    *Subject:* Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1 and
    col2 )

    now i have an xml file for which i have write a python script which read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python script. now
    i want to load this into created table. I have seen the example of in which
    the data is first loaded in u_data table then transform it using python
    script in u_data_new but in m scenario. it does not fit as i have xml file
    as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig
  • Sonal Goyal at Jun 10, 2010 at 11:43 am
    Can you try changing your logging level to debug and see the exact
    error message in hive.log?

    Thanks and Regards,
    Sonal
    www.meghsoft.com
    http://in.linkedin.com/in/sonalgoyal


    On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman wrote:
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------

    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------

    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    XML FILE:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    MAPPER
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println  ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    I am facing hard time. Any suggestions
    Thanks
    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo wrote:

    You could load this whole xml file into a table with a single row and a
    single column. The default record delimiter is \n but you can create a table
    where the record delimiter is \001. Once you do that you can follow the
    approach that you described below. Will this solve your problem?

    Ashish
    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1 and
    col2 )

    now i have an xml file for which i have write a python script which read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1     val2
    row2 =  val3     val4

    so the output of file has straight rows with the help of python script.
    now i want to load this into created table. I have seen the example of in
    which the data is first loaded in u_data table then transform it using
    python script in u_data_new but in m scenario. it does not fit as i have xml
    file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig
  • Shuja Rehman at Jun 10, 2010 at 11:58 am
    I have changes the logging level according to this command

    *bin/hive -hiveconf hive.root.logger=INFO,console *

    and the outout is

    ------------------------------------------------------------------------------------------------------------------------------
    10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE
    TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic
    Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
    tables
    10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize called
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.
    10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
    10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for destination
    tables
    10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test_new
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData in
    Semantic Analysis
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
    10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
    10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
    Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
    FieldSchema(name:c, type:string, comment:null)], properties:null)
    10/06/10 13:51:23 INFO ql.Driver: query plan =
    file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
    10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
    test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    Total MapReduce jobs = 2
    10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
    Launching Job 1 out of 2
    10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
    since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Using
    org.apache.hadoop.hive.ql.io.HiveInputFormat
    10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
    10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
    hdfs://localhost:9000/user/hive/warehouse/test
    10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to process
    : 1
    Starting Job = job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
    job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
    /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0%
    10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1 map
    = 0%, reduce = 0%
    2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0%
    10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1 map
    = 50%, reduce = 0%
    2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100%
    10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1 map
    = 100%, reduce = 100%
    Ended Job = job_201006101118_0009 with errors
    10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
    with errors

    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----

    10/06/10 13:51:58 ERROR exec.ExecDriver:
    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver
    10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
    from org.apache.hadoop.hive.ql.exec.ExecDriver

    -----------------------------------------------------------------------------------------------------------------------------

    Any clue???
    On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal wrote:

    Can you try changing your logging level to debug and see the exact
    error message in hive.log?

    Thanks and Regards,
    Sonal
    www.meghsoft.com
    http://in.linkedin.com/in/sonalgoyal


    On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman wrote:
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------
    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------
    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    XML FILE:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    MAPPER
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    I am facing hard time. Any suggestions
    Thanks
    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo wrote:

    You could load this whole xml file into a table with a single row and a
    single column. The default record delimiter is \n but you can create a
    table
    where the record delimiter is \001. Once you do that you can follow the
    approach that you described below. Will this solve your problem?

    Ashish
    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1
    and
    col2 )

    now i have an xml file for which i have write a python script which read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python script.
    now i want to load this into created table. I have seen the example of
    in
    which the data is first loaded in u_data table then transform it using
    python script in u_data_new but in m scenario. it does not fit as i have
    xml
    file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445
  • Shuja Rehman at Jun 10, 2010 at 12:01 pm
    and on the link
    http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed

    i have found this output.

    java.lang.RuntimeException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
    while processing row {"xmlfile":"*Hello*world"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
    Runtime Error while processing row {"xmlfile":"*Hello*world"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
    ... 4 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
    ... 5 more
    Caused by: java.io.IOException: Cannot run program
    "sampleMapper.groovy": java.io.IOException: error=2, No such file or
    directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 14 more
    Caused by: java.io.IOException: java.io.IOException: error=2, No such
    file or directory
    at java.lang.UNIXProcess.(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 15 more



    On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman wrote:

    I have changes the logging level according to this command

    *bin/hive -hiveconf hive.root.logger=INFO,console *

    and the outout is


    ------------------------------------------------------------------------------------------------------------------------------
    10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT OVERWRITE
    TABLE test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
    Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
    tables
    10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.
    10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
    10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for destination
    tables
    10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test_new
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
    in Semantic Analysis
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string c}
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
    10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
    10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
    Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
    FieldSchema(name:c, type:string, comment:null)], properties:null)
    10/06/10 13:51:23 INFO ql.Driver: query plan =
    file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
    10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
    test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    Total MapReduce jobs = 2
    10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
    Launching Job 1 out of 2
    10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
    since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Using
    org.apache.hadoop.hive.ql.io.HiveInputFormat
    10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
    10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
    hdfs://localhost:9000/user/hive/warehouse/test
    10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to process
    : 1
    Starting Job = job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
    job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
    /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0%
    10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1 map
    = 0%, reduce = 0%
    2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0%
    10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1 map
    = 50%, reduce = 0%
    2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100%
    10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1 map
    = 100%, reduce = 100%
    Ended Job = job_201006101118_0009 with errors
    10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
    with errors

    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----

    10/06/10 13:51:58 ERROR exec.ExecDriver:
    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----


    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver
    10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
    from org.apache.hadoop.hive.ql.exec.ExecDriver


    -----------------------------------------------------------------------------------------------------------------------------

    Any clue???

    On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal wrote:

    Can you try changing your logging level to debug and see the exact
    error message in hive.log?

    Thanks and Regards,
    Sonal
    www.meghsoft.com
    http://in.linkedin.com/in/sonalgoyal



    On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <shujamughal@gmail.com>
    wrote:
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------
    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------
    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    XML FILE:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    MAPPER
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    I am facing hard time. Any suggestions
    Thanks

    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <athusoo@facebook.com>
    wrote:
    You could load this whole xml file into a table with a single row and a
    single column. The default record delimiter is \n but you can create a
    table
    where the record delimiter is \001. Once you do that you can follow the
    approach that you described below. Will this solve your problem?

    Ashish
    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1
    and
    col2 )

    now i have an xml file for which i have write a python script which
    read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python script.
    now i want to load this into created table. I have seen the example of
    in
    which the data is first loaded in u_data table then transform it using
    python script in u_data_new but in m scenario. it does not fit as i
    have xml
    file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445
  • Tomasz Domański at Jun 11, 2010 at 3:36 pm
    Hi Shuja,

    the answer seems to be in lines:

    Caused by: java.io.IOException: Cannot run program
    "sampleMapper.groovy": java.io.IOException: error=2, No such file or
    directory


    Hadoop can't see this file or can't run it.

    1. make sure you added file correctly
    2. check if hadoop can run script on your hadoop machines

    Can you run this script in console on hadoop machine like
    sampleMapper.groovy
    or you runn it:
    groovy sampleMapper.groovy
    Mabe you should specify that groovy is needed to run your script.

    try to change your select into: " ... using 'groovy sampleMapper.groovy'
    ... "

    On 10 June 2010 14:01, Shuja Rehman wrote:

    and on the link

    http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed

    i have found this output.

    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}

    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}

    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
    ... 4 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator

    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)

    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)

    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)

    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
    ... 5 more
    Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 14 more
    Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory

    at java.lang.UNIXProcess.(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 15 more



    On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman wrote:

    I have changes the logging level according to this command

    *bin/hive -hiveconf hive.root.logger=INFO,console *

    and the outout is


    ------------------------------------------------------------------------------------------------------------------------------
    10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT
    OVERWRITE TABLE test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
    Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
    tables
    10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.
    10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
    10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for subqueries
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
    destination tables
    10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test_new
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
    c}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
    in Semantic Analysis
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
    c}
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
    10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
    10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
    Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
    FieldSchema(name:c, type:string, comment:null)], properties:null)
    10/06/10 13:51:23 INFO ql.Driver: query plan =
    file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
    10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE TABLE
    test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    Total MapReduce jobs = 2
    10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
    Launching Job 1 out of 2
    10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to 0
    since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Using
    org.apache.hadoop.hive.ql.io.HiveInputFormat
    10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
    10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
    hdfs://localhost:9000/user/hive/warehouse/test
    10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to
    process : 1
    Starting Job = job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
    job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
    /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0%
    10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1
    map = 0%, reduce = 0%
    2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0%
    10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1
    map = 50%, reduce = 0%
    2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100%
    10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1
    map = 100%, reduce = 100%
    Ended Job = job_201006101118_0009 with errors
    10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job = job_201006101118_0009
    with errors

    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----

    10/06/10 13:51:58 ERROR exec.ExecDriver:
    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----


    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver
    10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
    from org.apache.hadoop.hive.ql.exec.ExecDriver


    -----------------------------------------------------------------------------------------------------------------------------

    Any clue???

    On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal wrote:

    Can you try changing your logging level to debug and see the exact
    error message in hive.log?

    Thanks and Regards,
    Sonal
    www.meghsoft.com
    http://in.linkedin.com/in/sonalgoyal



    On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <shujamughal@gmail.com>
    wrote:
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------
    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------
    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    XML FILE:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    MAPPER
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    I am facing hard time. Any suggestions
    Thanks

    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <athusoo@facebook.com>
    wrote:
    You could load this whole xml file into a table with a single row and
    a
    single column. The default record delimiter is \n but you can create a
    table
    where the record delimiter is \001. Once you do that you can follow
    the
    approach that you described below. Will this solve your problem?

    Ashish
    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1
    and
    col2 )

    now i have an xml file for which i have write a python script which
    read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python
    script.
    now i want to load this into created table. I have seen the example of
    in
    which the data is first loaded in u_data table then transform it using
    python script in u_data_new but in m scenario. it does not fit as i
    have xml
    file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445
  • Shuja Rehman at Jun 11, 2010 at 3:44 pm
    Hi Tomasz Domański
    Thanks for answer. This problem is solved now. This exception was due to
    file which was missing before. now the program runs fine if whole xml file
    is in one line not having (\n). But the actual problem is that hive does not
    support row terminator other than '\n' according to my research. so the
    problem i want to load whole xml file into single row and single column so
    groovy script can have whole xml file as input and then parse it.

    Please let me know how to do it?
    Thanks

    2010/6/11 Tomasz Domański <domasclimber@gmail.com>
    Hi Shuja,

    the answer seems to be in lines:

    Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory


    Hadoop can't see this file or can't run it.

    1. make sure you added file correctly
    2. check if hadoop can run script on your hadoop machines

    Can you run this script in console on hadoop machine like
    sampleMapper.groovy
    or you runn it:
    groovy sampleMapper.groovy
    Mabe you should specify that groovy is needed to run your script.

    try to change your select into: " ... using 'groovy sampleMapper.groovy'
    ... "

    On 10 June 2010 14:01, Shuja Rehman wrote:

    and on the link

    http://localhost:50030/jobfailures.jsp?jobid=job_201006101118_0009&kind=map&cause=failed

    i have found this output.

    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}


    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)


    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmlfile":"*Hello*world"}


    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:417)
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:153)
    ... 4 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator


    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:319)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)


    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)


    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:696)


    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:400)
    ... 5 more
    Caused by: java.io.IOException: Cannot run program "sampleMapper.groovy": java.io.IOException: error=2, No such file or directory


    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 14 more
    Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory


    at java.lang.UNIXProcess.(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 15 more



    On Thu, Jun 10, 2010 at 1:57 PM, Shuja Rehman wrote:

    I have changes the logging level according to this command

    *bin/hive -hiveconf hive.root.logger=INFO,console *

    and the outout is


    ------------------------------------------------------------------------------------------------------------------------------
    10/06/10 13:51:20 INFO parse.ParseDriver: Parsing command: INSERT
    OVERWRITE TABLE test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    10/06/10 13:51:20 INFO parse.ParseDriver: Parse Completed
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Completed phase 1 of
    Semantic Analysis
    10/06/10 13:51:20 INFO parse.SemanticAnalyzer: Get metadata for source
    tables
    10/06/10 13:51:20 INFO metastore.HiveMetaStore: 0: Opening raw store with
    implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    10/06/10 13:51:20 INFO metastore.ObjectStore: ObjectStore, initialize
    called
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    10/06/10 13:51:20 ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.
    10/06/10 13:51:22 INFO metastore.ObjectStore: Initialized ObjectStore
    10/06/10 13:51:22 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
    subqueries
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Get metadata for
    destination tables
    10/06/10 13:51:23 INFO metastore.HiveMetaStore: 0: get_table : db=default
    tbl=test_new
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
    c}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed getting MetaData
    in Semantic Analysis
    10/06/10 13:51:23 INFO hive.log: DDL: struct test_new { string b, string
    c}
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for FS(3)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SCR(2)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for SEL(1)
    10/06/10 13:51:23 INFO ppd.OpProcFactory: Processing for TS(0)
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO hive.log: DDL: struct test { string xmlfile}
    10/06/10 13:51:23 INFO parse.SemanticAnalyzer: Completed plan generation
    10/06/10 13:51:23 INFO ql.Driver: Semantic Analysis Completed
    10/06/10 13:51:23 INFO ql.Driver: Returning Hive schema:
    Schema(fieldSchemas:[FieldSchema(name:b, type:string, comment:null),
    FieldSchema(name:c, type:string, comment:null)], properties:null)
    10/06/10 13:51:23 INFO ql.Driver: query plan =
    file:/tmp/root/hive_2010-06-10_13-51-20_112_5091815325633732890/queryplan.xml
    10/06/10 13:51:24 INFO ql.Driver: Starting command: INSERT OVERWRITE
    TABLE test_new

    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test
    Total MapReduce jobs = 2
    10/06/10 13:51:24 INFO ql.Driver: Total MapReduce jobs = 2
    Launching Job 1 out of 2
    10/06/10 13:51:24 INFO ql.Driver: Launching Job 1 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Number of reduce tasks is set to
    0 since there's no reduce operator
    10/06/10 13:51:24 INFO exec.ExecDriver: Using
    org.apache.hadoop.hive.ql.io.HiveInputFormat
    10/06/10 13:51:24 INFO exec.ExecDriver: Processing alias test
    10/06/10 13:51:24 INFO exec.ExecDriver: Adding input file
    hdfs://localhost:9000/user/hive/warehouse/test
    10/06/10 13:51:24 WARN mapred.JobClient: Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    10/06/10 13:51:24 INFO mapred.FileInputFormat: Total input paths to
    process : 1
    Starting Job = job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Starting Job =
    job_201006101118_0009, Tracking URL =
    http://localhost:50030/jobdetails.jsp?jobid=job_201006101118_0009
    Kill Command = /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    10/06/10 13:51:25 INFO exec.ExecDriver: Kill Command =
    /usr/local/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
    -Dmapred.job.tracker=localhost:9001 -kill job_201006101118_0009
    2010-06-10 13:51:32,255 Stage-1 map = 0%, reduce = 0%
    10/06/10 13:51:32 INFO exec.ExecDriver: 2010-06-10 13:51:32,255 Stage-1
    map = 0%, reduce = 0%
    2010-06-10 13:51:35,305 Stage-1 map = 50%, reduce = 0%
    10/06/10 13:51:35 INFO exec.ExecDriver: 2010-06-10 13:51:35,305 Stage-1
    map = 50%, reduce = 0%
    2010-06-10 13:51:58,505 Stage-1 map = 100%, reduce = 100%
    10/06/10 13:51:58 INFO exec.ExecDriver: 2010-06-10 13:51:58,505 Stage-1
    map = 100%, reduce = 100%
    Ended Job = job_201006101118_0009 with errors
    10/06/10 13:51:58 ERROR exec.ExecDriver: Ended Job =
    job_201006101118_0009 with errors

    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----

    10/06/10 13:51:58 ERROR exec.ExecDriver:
    Task with the most failures(4):
    -----
    Task ID:
    task_201006101118_0009_m_000000

    URL:

    http://localhost:50030/taskdetails.jsp?jobid=job_201006101118_0009&tipid=task_201006101118_0009_m_000000
    -----


    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver
    10/06/10 13:51:58 ERROR ql.Driver: FAILED: Execution Error, return code 2
    from org.apache.hadoop.hive.ql.exec.ExecDriver


    -----------------------------------------------------------------------------------------------------------------------------

    Any clue???

    On Thu, Jun 10, 2010 at 1:43 PM, Sonal Goyal wrote:

    Can you try changing your logging level to debug and see the exact
    error message in hive.log?

    Thanks and Regards,
    Sonal
    www.meghsoft.com
    http://in.linkedin.com/in/sonalgoyal



    On Thu, Jun 10, 2010 at 5:07 PM, Shuja Rehman <shujamughal@gmail.com>
    wrote:
    Hi
    I have try to do as you described. Let me explain in steps.

    1- create table test (xmlFile String);
    ----------------------------------------------------------------------------------
    2-LOAD DATA LOCAL INPATH '1.xml'
    OVERWRITE INTO TABLE test;
    ----------------------------------------------------------------------------------
    3-CREATE TABLE test_new (
    b STRING,
    c STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t';

    ----------------------------------------------------------------------------------
    4-add FILE sampleMapper.groovy;
    ----------------------------------------------------------------------------------
    5- INSERT OVERWRITE TABLE test_new
    SELECT
    TRANSFORM (xmlfile)
    USING 'sampleMapper.groovy'
    AS (b,c)
    FROM test;
    ----------------------------------------------------------------------------------
    XML FILE:
    xml file has only one row for testing purpose which is

    <xy><a><b>Hello</b><c>world</c></a></xy>
    ----------------------------------------------------------------------------------
    MAPPER
    and i have write the mapper in groovy to parse it. the mapper is

    def xmlData =""
    System.in.withReader {
    xmlData=xmlData+ it.readLine()
    }

    def xy = new XmlParser().parseText(xmlData)
    def b=xy.a.b.text()
    def c=xy.a.c.text()
    println ([b,c].join('\t') )
    ----------------------------------------------------------------------------------
    Now step 1-4 are fine but when i perform step 5 which will load the data
    from test table to new table using mapper, it throws the error. The error on
    console is

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    I am facing hard time. Any suggestions
    Thanks

    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo <athusoo@facebook.com>
    wrote:
    You could load this whole xml file into a table with a single row and
    a
    single column. The default record delimiter is \n but you can create
    a table
    where the record delimiter is \001. Once you do that you can follow
    the
    approach that you described below. Will this solve your problem?

    Ashish
    ________________________________
    From: Shuja Rehman
    Sent: Wednesday, June 09, 2010 3:07 PM
    To: hive-user@hadoop.apache.org
    Subject: Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1
    and
    col2 )

    now i have an xml file for which i have write a python script which
    read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python
    script.
    now i want to load this into created table. I have seen the example
    of in
    which the data is first loaded in u_data table then transform it
    using
    python script in u_data_new but in m scenario. it does not fit as i
    have xml
    file as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    --
    Regards
    Baig


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445

    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445
  • Shuja Rehman at Jun 10, 2010 at 10:54 pm
    Hi Ashish

    Can you tell me how to create a table using \001 as record delimiter. i am
    trying according to this

    *create table test (xmlFile String)ROW FORMAT DELIMITED FIELDS TERMINATED BY
    '\t' LINES TERMINATED BY '\001' ;*

    but it giving me the error saying that

    *ERROR ql.Driver: FAILED: Error in semantic analysis: LINES TERMINATED BY
    only supports newline '\n' right now*


    On Thu, Jun 10, 2010 at 3:05 AM, Ashish Thusoo wrote:

    You could load this whole xml file into a table with a single row and a
    single column. The default record delimiter is \n but you can create a table
    where the record delimiter is \001. Once you do that you can follow the
    approach that you described below. Will this solve your problem?

    Ashish

    ------------------------------
    *From:* Shuja Rehman
    *Sent:* Wednesday, June 09, 2010 3:07 PM
    *To:* hive-user@hadoop.apache.org
    *Subject:* Load data from xml using Mapper.py in hive

    Hi
    I have created a table in hive (Suppose table1 with two columns, col1 and
    col2 )

    now i have an xml file for which i have write a python script which read
    the xml file and transform it in single row with tab seperated
    e.g the output of python script can be

    row 1 = val1 val2
    row2 = val3 val4

    so the output of file has straight rows with the help of python script. now
    i want to load this into created table. I have seen the example of in which
    the data is first loaded in u_data table then transform it using python
    script in u_data_new but in m scenario. it does not fit as i have xml file
    as source.


    Kindly let me know can I achieve this??
    Thanks

    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445


    --
    Regards
    Shuja-ur-Rehman Baig
    _________________________________
    MS CS - School of Science and Engineering
    Lahore University of Management Sciences (LUMS)
    Sector U, DHA, Lahore, 54792, Pakistan
    Cell: +92 3214207445

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 9, '10 at 10:07p
activeJun 11, '10 at 3:44p
posts9
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase