FAQ
We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
same hive package kept working against the new hadoop setup.

Since the upgrade every hive starts with only 1 map task though. Even
after setting it with eg: set mapred.map.tasks=32;
We recompiled our hive setup against hadoop 0.20 and still get the same issue.

Any suggestions for something obvious we might have missed?

~Tim.

Search Discussions

  • Namit Jain at Feb 23, 2010 at 7:03 pm
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Tim Sell at Feb 23, 2010 at 7:11 pm
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Tim Sell at Feb 23, 2010 at 7:14 pm
    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Namit Jain at Feb 23, 2010 at 7:21 pm
    What is the size of the input data for the query ?

    Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



    -namit

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:14 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Tim Sell at Feb 23, 2010 at 7:26 pm
    It happens on a table that is a single 30 gig tab separated file.
    It also happens on tables that are split over a hundreds files.

    On 23 February 2010 19:20, Namit Jain wrote:
    What is the size of the input data for the query ?

    Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



    -namit

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:14 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Namit Jain at Feb 23, 2010 at 9:55 pm
    Can you check the parameters: mapred.min.split.size and dfs.block.size ?

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:26 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    It happens on a table that is a single 30 gig tab separated file.
    It also happens on tables that are split over a hundreds files.

    On 23 February 2010 19:20, Namit Jain wrote:
    What is the size of the input data for the query ?

    Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



    -namit

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:14 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Tim Sell at Feb 24, 2010 at 11:25 am
    Hi again,

    mapred.min.split.size=0
    dfs.block.size=134217728


    On 23 February 2010 21:54, Namit Jain wrote:
    Can you check the parameters: mapred.min.split.size and dfs.block.size ?

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:26 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    It happens on a table that is a single 30 gig tab separated file.
    It also happens on tables that are split over a hundreds files.

    On 23 February 2010 19:20, Namit Jain wrote:
    What is the size of the input data for the query ?

    Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



    -namit

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:14 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.
  • Tim Sell at Feb 24, 2010 at 1:01 pm
    It's fixed.
    We didn't figure out caused it, but we seem to have fixed it by
    upgrading to the latest cloudera version of hive.

    thanks
    On 24 February 2010 11:25, Tim Sell wrote:
    Hi again,

    mapred.min.split.size=0
    dfs.block.size=134217728


    On 23 February 2010 21:54, Namit Jain wrote:
    Can you check the parameters: mapred.min.split.size and dfs.block.size ?

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:26 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    It happens on a table that is a single 30 gig tab separated file.
    It also happens on tables that are split over a hundreds files.

    On 23 February 2010 19:20, Namit Jain wrote:
    What is the size of the input data for the query ?

    Since you are using CombineHiveInputFormat, multiple files can be read by a single mapper.



    -namit

    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:14 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: Hive jobs only run with 1 map task

    If it helps looking at the job conf in the map reduce logs I noticed
    mapred.input.format.class=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

    On 23 February 2010 19:11, Tim Sell wrote:
    Is hive.input.format set on the table? I'm not sure how to pull that
    out again. I know they are stored as text though.
    I should mention they do actually parse/process correctly.

    Here are all the set parameters

    hive> set;
    silent=off
    javax.jdo.option.ConnectionUserName=hive
    hive.exec.reducers.bytes.per.reducer=100000000
    hive.mapred.local.mem=0
    datanucleus.autoStartMechanismMode=checked
    hive.metastore.connect.retries=5
    datanucleus.validateColumns=false
    hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore
    datanucleus.autoCreateSchema=true
    javax.jdo.option.ConnectionPassword=hive
    datanucleus.validateConstraints=false
    datancucleus.transactionIsolation=read-committed
    datanucleus.validateTables=false
    hive.map.aggr.hash.min.reduction=0.5
    datanucleus.storeManagerType=rdbms
    hive.exec.script.maxerrsize=100000
    hive.merge.size.per.task=256000000
    hive.test.mode.prefix=test_
    hive.groupby.skewindata=false
    hive.default.fileformat=TextFile
    hive.script.auto.progress=false
    hive.groupby.mapaggr.checkinterval=100000
    hive.hwi.listen.port=9999
    datanuclues.cache.level2=true
    hive.hwi.war.file=${HIVE_HOME}/lib/hive-hwi.war
    hive.merge.mapfiles=true
    hive.exec.compress.output=false
    datanuclues.cache.level2.type=SOFT
    javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
    hive.map.aggr=true
    hive.join.emit.interval=1000
    hive.metastore.warehouse.dir=hdfs://master1.hadoop.last.fm:8020/user/hive/warehouse
    javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory
    hive.mapred.mode=nonstrict
    hive.exec.scratchdir=/tmp/hive-${user.name}
    javax.jdo.option.NonTransactionalRead=true
    hive.metastore.local=true
    hive.test.mode.samplefreq=32
    hive.test.mode=false
    javax.jdo.option.ConnectionURL=jdbc:mysql://10.101.1.35/hive?createDatabaseIfNotExist=true
    javax.jdo.option.DetachAllOnCommit=true
    hive.heartbeat.interval=1000
    hive.map.aggr.hash.percentmemory=0.5
    hive.exec.reducers.max=107
    hive.hwi.listen.host=0.0.0.0
    hive.exec.compress.intermediate=false
    hive.optimize.cp=true
    hive.optimize.ppd=true
    hive.session.id=tims_201002231907
    hive.merge.mapredfiles=false

    ~Tim.
    On 23 February 2010 19:03, Namit Jain wrote:
    Can you check your input format ?

    Can you check the value of the parameter :
    hive.input.format ?

    Can you send all the parameters ?



    Thanks,
    -namit



    -----Original Message-----
    From: Tim Sell
    Sent: Tuesday, February 23, 2010 11:00 AM
    To: hive-user@hadoop.apache.org
    Subject: Hive jobs only run with 1 map task

    We just upgraded to hadoop 0.20 (from hadoop 0.18), impressively our
    same hive package kept working against the new hadoop setup.

    Since the upgrade every hive starts with only 1 map task though. Even
    after setting it with eg: set mapred.map.tasks=32;
    We recompiled our hive setup against hadoop 0.20 and still get the same issue.

    Any suggestions for something obvious we might have missed?

    ~Tim.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 23, '10 at 7:00p
activeFeb 24, '10 at 1:01p
posts9
users2
websitehive.apache.org

2 users in discussion

Tim Sell: 6 posts Namit Jain: 3 posts

People

Translate

site design / logo © 2022 Grokbase