Grokbase Groups Hive user May 2010
FAQ
Hi,
I have a problem with loading data to partitioned table from remote
machine (Hive is on Hadoop 0.20). When I try to execute query, for example:

LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
(day='2010-10-02');

Hive on machine that I do this query tells that:
Copying data from file: file
Loading data to table wh_im_status partition {day=2010-10-11}
Failed with exception org.apache.thrift.TApplicationException:
get_partition failed: unknown result
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

and Thrift server writes only this:
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
db=default tbl=wh_im_status
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
db=default tbl=wh_im_status

When I load data to table without defining partition, it works. When I
load data to hive from local machine with defining partition, it also
works. Where is the problem?






--
Aleksander Siewierski
Software Engineer
Gadu Radio Tech

GG Network S.A.
Al. Stanów Zjednoczonych 61, 04-028 Warszawa
tel.: +48 22 4277900 w. 135 fax.: 22 / 514 64 98
GG: 16109

Spółka zarejestrowana w Sądzie Rejonowym dla m. st. Warszawy, XIII
Wydział Gospodarczy KRS pod numerem 0000264575, NIP 867-19-48-977.
Kapitały zakładowy: 1 758 461,10 zł - wpłacony w całości.

Search Discussions

  • Aleksander Siewierski / Gadu-Gadu at May 25, 2010 at 4:10 pm
    More precisely:

    We have hive server (which is also host with hdfs), hive on this machine
    is configured to use mysql database on other host. Server is launched by
    "hive --service metastore" command.
    While running server I get worrying logs (which maybe causes this problem):
    10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered
    an error in file
    "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at
    line 4, column 6 : cvc-elt.1: Cannot find the declaration of element
    'jdo'. - Please check your specification of DTD and the validity of the
    MetaData XML that you have specified.
    10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered
    an error in file
    "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at
    line 282, column 13 : The content of element type "class" must match
    "(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*)".
    - Please check your specification of DTD and the validity of the
    MetaData XML that you have specified.

    We have also hive client, which is configured to use hive server
    mentioned above.

    When launching hive on hive client:

    query:
    LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
    (day='2010-10-02');

    response:
    Copying data from file: file
    Loading data to table wh_im_status partition {day=2010-10-11}
    Failed with exception org.apache.thrift.TApplicationException:
    get_partition failed: unknown result
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    server logs while executing above query:
    10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
    db=default tbl=wh_im_status
    10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
    db=default tbl=wh_im_status

    This problem appears also when using python hive library from client host.


    When I launch hive on hive server host the same query as quoted above
    works fine.
    When I create table wh_im_status2 without partitions, loading data works
    fine, so this problem is stricly connected with pushing data to
    partitions through thrift.


    Our main goal is to load partitioned data from remote hosts into hadoop
    hive. Maybe you are reaching that goal in another way?




    Full configuration:

    Server:

    <configuration>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive-${user.name}</value>
    <description>Scratch space for Hive jobs</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://hive-test-db-1.test/hive_metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive_test</value>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive_test</value>
    </property>

    <property>
    <name>hive.metastore.metadb.dir</name>
    <value>file:///var/metastore/metadb/</value>
    <description>The location of filestore metadata base dir</description>
    </property>

    <property>
    <name>hive.metastore.uris</name>
    <value>file:///var/lib/hivevar/metastore/metadb/</value>
    <description>Comma separated list of URIs of metastore servers. The
    first server that can be connected to will be used.</description>
    </property>

    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/var/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    <property>
    <name>hive.metastore.connect.retries</name>
    <value>5</value>
    <description>Number of retries while opening a connection to
    metastore</description>
    </property>

    <property>
    <name>hive.metastore.rawstore.impl</name>
    <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Name of the class that implements
    org.apache.hadoop.hive.metastore.rawstore interface. This class is used
    to store and retrieval of raw metadata objects such as table,
    database</description>
    </property>

    <property>
    <name>hive.default.fileformat</name>
    <value>TextFile</value>
    <description>Default file format for CREATE TABLE statement. Options
    are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ...
    STORED AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
    </property>

    <property>
    <name>hive.map.aggr</name>
    <value>false</value>
    <description>Whether to use map-side aggregation in Hive Group By
    queries</description>
    </property>

    <property>
    <name>hive.join.emit.interval</name>
    <value>1000</value>
    <description>How many rows in the right-most join operand Hive should
    buffer before emitting the join result. </description>
    </property>

    <property>
    <name>hive.exec.script.maxerrsize</name>
    <value>100000</value>
    <description>Maximum number of bytes a script is allowed to emit to
    standard error (per map-reduce task). This prevents runaway scripts from
    filling logs partitions to capacity </description>
    </property>

    <property>
    <name>hive.exec.compress.output</name>
    <value>false</value>
    <description> This controls whether the final outputs of a query (to
    a local/hdfs file or a hive table) is compressed. The compression codec
    and other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.exec.compress.intermediate</name>
    <value>false</value>
    <description> This controls whether intermediate files produced by
    hive between multiple map-reduce jobs are compressed. The compression
    codec and other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.hwi.listen.host</name>
    <value>0.0.0.0</value>
    <description>This is the host address the Hive Web Interface will
    listen on</description>
    </property>

    <property>
    <name>hive.hwi.listen.port</name>
    <value>9999</value>
    <description>This is the port the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive-hwi-0.5.0.war</value>
    <description>This is the WAR file with the jsp content for Hive Web
    Interface</description>
    </property>

    </configuration>



    Client:

    <configuration>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive-${user.name}</value>
    <description>Scratch space for Hive jobs</description>
    </property>

    <property>
    <name>hive.metastore.local</name>
    <value>false</value>
    <description>controls whether to connect to remove metastore server
    or open a new metastore server in Hive Client JVM</description>
    </property>

    <property>
    <name>hive.metastore.metadb.dir</name>
    <value>file:///var/metastore/metadb/</value>
    <description>The location of filestore metadata base dir</description>
    </property>

    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://test-storage-1.atm:9083</value>
    <description>Comma separated list of URIs of metastore servers. The
    first server that can be connected to will be used.</description>
    </property>

    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://test-storage-1.atm:54310/var/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    <property>
    <name>hive.metastore.connect.retries</name>
    <value>5</value>
    <description>Number of retries while opening a connection to
    metastore</description>
    </property>

    <property>
    <name>hive.metastore.rawstore.impl</name>
    <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Name of the class that implements
    org.apache.hadoop.hive.metastore.rawstore interface. This class is used
    to store and retrieval of raw metadata objects such as table,
    database</description>
    </property>

    <property>
    <name>hive.default.fileformat</name>
    <value>TextFile</value>
    <description>Default file format for CREATE TABLE statement. Options
    are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ...
    STORED AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
    </property>

    <property>
    <name>hive.map.aggr</name>
    <value>false</value>
    <description>Whether to use map-side aggregation in Hive Group By
    queries</description>
    </property>

    <property>
    <name>hive.join.emit.interval</name>
    <value>1000</value>
    <description>How many rows in the right-most join operand Hive should
    buffer before emitting the join result. </description>
    </property>

    <property>
    <name>hive.exec.script.maxerrsize</name>
    <value>100000</value>
    <description>Maximum number of bytes a script is allowed to emit to
    standard error (per map-reduce task). This prevents runaway scripts from
    filling logs partitions to capacity </description>
    </property>

    <property>
    <name>hive.exec.compress.output</name>
    <value>false</value>
    <description> This controls whether the final outputs of a query (to
    a local/hdfs file or a hive table) is compressed. The compression codec
    and other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.exec.compress.intermediate</name>
    <value>false</value>
    <description> This controls whether intermediate files produced by
    hive between multiple map-reduce jobs are compressed. The compression
    codec and other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.hwi.listen.host</name>
    <value>0.0.0.0</value>
    <description>This is the host address the Hive Web Interface will
    listen on</description>
    </property>

    <property>
    <name>hive.hwi.listen.port</name>
    <value>9999</value>
    <description>This is the port the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive_hwi.war</value>
    <description>This is the WAR file with the jsp content for Hive Web
    Interface</description>
    </property>

    </configuration>

    --
    Aleksander Siewierski
  • Edward Capriolo at May 25, 2010 at 5:24 pm

    On Tue, May 25, 2010 at 12:09 PM, Aleksander Siewierski / Gadu-Gadu wrote:

    More precisely:

    We have hive server (which is also host with hdfs), hive on this machine is
    configured to use mysql database on other host. Server is launched by "hive
    --service metastore" command.
    While running server I get worrying logs (which maybe causes this problem):
    10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an
    error in file
    "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line
    4, column 6 : cvc-elt.1: Cannot find the declaration of element 'jdo'. -
    Please check your specification of DTD and the validity of the MetaData XML
    that you have specified.
    10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an
    error in file
    "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line
    282, column 13 : The content of element type "class" must match
    "(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*)".
    - Please check your specification of DTD and the validity of the MetaData
    XML that you have specified.

    We have also hive client, which is configured to use hive server mentioned
    above.

    When launching hive on hive client:

    query:

    LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
    (day='2010-10-02');

    response:

    Copying data from file: file
    Loading data to table wh_im_status partition {day=2010-10-11}
    Failed with exception org.apache.thrift.TApplicationException:
    get_partition failed: unknown result
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    server logs while executing above query:

    10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
    db=default tbl=wh_im_status
    10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
    db=default tbl=wh_im_status

    This problem appears also when using python hive library from client host.


    When I launch hive on hive server host the same query as quoted above works
    fine.
    When I create table wh_im_status2 without partitions, loading data works
    fine, so this problem is stricly connected with pushing data to partitions
    through thrift.


    Our main goal is to load partitioned data from remote hosts into hadoop
    hive. Maybe you are reaching that goal in another way?




    Full configuration:

    Server:

    <configuration>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive-${user.name}</value>
    <description>Scratch space for Hive jobs</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionURL</name>


    <value>jdbc:mysql://hive-test-db-1.test/hive_metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive_test</value>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive_test</value>
    </property>

    <property>
    <name>hive.metastore.metadb.dir</name>
    <value>file:///var/metastore/metadb/</value>
    <description>The location of filestore metadata base dir</description>
    </property>

    <property>
    <name>hive.metastore.uris</name>
    <value>file:///var/lib/hivevar/metastore/metadb/</value>
    <description>Comma separated list of URIs of metastore servers. The first
    server that can be connected to will be used.</description>
    </property>

    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/var/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    <property>
    <name>hive.metastore.connect.retries</name>
    <value>5</value>
    <description>Number of retries while opening a connection to
    metastore</description>
    </property>

    <property>
    <name>hive.metastore.rawstore.impl</name>
    <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Name of the class that implements
    org.apache.hadoop.hive.metastore.rawstore interface. This class is used to
    store and retrieval of raw metadata objects such as table,
    database</description>
    </property>

    <property>
    <name>hive.default.fileformat</name>
    <value>TextFile</value>
    <description>Default file format for CREATE TABLE statement. Options are
    TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED
    AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
    </property>

    <property>
    <name>hive.map.aggr</name>
    <value>false</value>
    <description>Whether to use map-side aggregation in Hive Group By
    queries</description>
    </property>

    <property>
    <name>hive.join.emit.interval</name>
    <value>1000</value>
    <description>How many rows in the right-most join operand Hive should
    buffer before emitting the join result. </description>
    </property>

    <property>
    <name>hive.exec.script.maxerrsize</name>
    <value>100000</value>
    <description>Maximum number of bytes a script is allowed to emit to
    standard error (per map-reduce task). This prevents runaway scripts from
    filling logs partitions to capacity </description>
    </property>

    <property>
    <name>hive.exec.compress.output</name>
    <value>false</value>
    <description> This controls whether the final outputs of a query (to a
    local/hdfs file or a hive table) is compressed. The compression codec and
    other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.exec.compress.intermediate</name>
    <value>false</value>
    <description> This controls whether intermediate files produced by hive
    between multiple map-reduce jobs are compressed. The compression codec and
    other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.hwi.listen.host</name>
    <value>0.0.0.0</value>
    <description>This is the host address the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.listen.port</name>
    <value>9999</value>
    <description>This is the port the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive-hwi-0.5.0.war</value>
    <description>This is the WAR file with the jsp content for Hive Web
    Interface</description>
    </property>

    </configuration>



    Client:

    <configuration>
    <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive-${user.name}</value>
    <description>Scratch space for Hive jobs</description>
    </property>

    <property>
    <name>hive.metastore.local</name>
    <value>false</value>
    <description>controls whether to connect to remove metastore server or
    open a new metastore server in Hive Client JVM</description>
    </property>

    <property>
    <name>hive.metastore.metadb.dir</name>
    <value>file:///var/metastore/metadb/</value>
    <description>The location of filestore metadata base dir</description>
    </property>

    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://test-storage-1.atm:9083</value>
    <description>Comma separated list of URIs of metastore servers. The first
    server that can be connected to will be used.</description>
    </property>

    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://test-storage-1.atm:54310/var/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    <property>
    <name>hive.metastore.connect.retries</name>
    <value>5</value>
    <description>Number of retries while opening a connection to
    metastore</description>
    </property>

    <property>
    <name>hive.metastore.rawstore.impl</name>
    <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Name of the class that implements
    org.apache.hadoop.hive.metastore.rawstore interface. This class is used to
    store and retrieval of raw metadata objects such as table,
    database</description>
    </property>

    <property>
    <name>hive.default.fileformat</name>
    <value>TextFile</value>
    <description>Default file format for CREATE TABLE statement. Options are
    TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED
    AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
    </property>

    <property>
    <name>hive.map.aggr</name>
    <value>false</value>
    <description>Whether to use map-side aggregation in Hive Group By
    queries</description>
    </property>

    <property>
    <name>hive.join.emit.interval</name>
    <value>1000</value>
    <description>How many rows in the right-most join operand Hive should
    buffer before emitting the join result. </description>
    </property>

    <property>
    <name>hive.exec.script.maxerrsize</name>
    <value>100000</value>
    <description>Maximum number of bytes a script is allowed to emit to
    standard error (per map-reduce task). This prevents runaway scripts from
    filling logs partitions to capacity </description>
    </property>

    <property>
    <name>hive.exec.compress.output</name>
    <value>false</value>
    <description> This controls whether the final outputs of a query (to a
    local/hdfs file or a hive table) is compressed. The compression codec and
    other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.exec.compress.intermediate</name>
    <value>false</value>
    <description> This controls whether intermediate files produced by hive
    between multiple map-reduce jobs are compressed. The compression codec and
    other options are determined from hadoop config variables
    mapred.output.compress* </description>
    </property>

    <property>
    <name>hive.hwi.listen.host</name>
    <value>0.0.0.0</value>
    <description>This is the host address the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.listen.port</name>
    <value>9999</value>
    <description>This is the port the Hive Web Interface will listen
    on</description>
    </property>

    <property>
    <name>hive.hwi.war.file</name>
    <value>/usr/lib/hive/lib/hive_hwi.war</value>
    <description>This is the WAR file with the jsp content for Hive Web
    Interface</description>
    </property>

    </configuration>

    --
    Aleksander Siewierski
    Our main goal is to load partitioned data from remote hosts into hadoop
    hive. Maybe you are reaching that goal in another way?

    You can not load data VIA hive like this.

    LOAD DATA LOCAL INPATH 'XX' attempts to load data from the node launching
    hive.

    If your client is the CLI this works, as the CLI is running on the same node
    with the data.

    If your client is though the hive-service the file would have to be located
    on the machine running the hive-service, not your current host.

    Edward
  • Aleksander Siewierski / Gadu-Gadu at May 26, 2010 at 8:25 am

    Edward Capriolo wrote:
    Our main goal is to load partitioned data from remote hosts into hadoop
    hive. Maybe you are reaching that goal in another way?

    You can not load data VIA hive like this.

    LOAD DATA LOCAL INPATH 'XX' attempts to load data from the node
    launching hive.

    If your client is the CLI this works, as the CLI is running on the same
    node with the data.

    If your client is though the hive-service the file would have to be
    located on the machine running the hive-service, not your current host.

    Edward

    Problem is not with location of data files, LOAD works without defining
    partition where data should be located. If you define partition, there
    is error: get_partition failed: unknown result.



    --
    Aleksander Siewierski
    Software Engineer
    Gadu Radio Tech

    GG Network S.A.
    Al. Stanów Zjednoczonych 61, 04-028 Warszawa
    tel.: +48 22 4277900 w. 135 fax.: 22 / 514 64 98
    GG: 16109

    Spółka zarejestrowana w Sądzie Rejonowym dla m. st. Warszawy, XIII
    Wydział Gospodarczy KRS pod numerem 0000264575, NIP 867-19-48-977.
    Kapitały zakładowy: 1 758 461,10 zł - wpłacony w całości.
  • Aleksander Siewierski / Gadu-Gadu at May 26, 2010 at 11:37 am
    What is more, when partition was created before by loading file locally,
    when I try to load file to this existing partition from remote machine,
    it WORKS, why? Problem is with creating new partition?

    Adding partition, before loading file, by alter table ... add partition
    doesn't help, partiton is created, but error is the same, 'get_partition
    failed'



    Aleksander Siewierski / Gadu-Gadu wrote:
    Edward Capriolo wrote:
    Our main goal is to load partitioned data from remote hosts into
    hadoop hive. Maybe you are reaching that goal in another way?

    You can not load data VIA hive like this.

    LOAD DATA LOCAL INPATH 'XX' attempts to load data from the node
    launching hive.

    If your client is the CLI this works, as the CLI is running on the
    same node with the data.

    If your client is though the hive-service the file would have to be
    located on the machine running the hive-service, not your current host.

    Edward

    Problem is not with location of data files, LOAD works without defining
    partition where data should be located. If you define partition, there
    is error: get_partition failed: unknown result.


    --
    Aleksander Siewierski

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 24, '10 at 5:42p
activeMay 26, '10 at 11:37a
posts5
users2
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase