Hello,

I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs agains text files on
hadoop, but I have a problem when I try to run the same job against hbase table.

The table looks like this:
hbase(main):015:0> scan 'table1'
ROW COLUMN+CELL

row1 column=family1:a, timestamp=1298037737154,
value=1

row1 column=family1:b, timestamp=1298037744658,
value=2

row1 column=family1:c, timestamp=1298037748020,
value=3

row2 column=family1:a, timestamp=1298037755440,
value=11

row2 column=family1:b, timestamp=1298037758241,
value=22

row2 column=family1:c, timestamp=1298037761198,
value=33

row3 column=family1:a, timestamp=1298037767127,
value=111

row3 column=family1:b, timestamp=1298037770111,
value=222

row3 column=family1:c, timestamp=1298037774954,
value=333

3 row(s) in 0.0240 seconds


And command I use, with the exception I get:

# hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
hbase.mapred.tablecolumn=family1: -input table1 -output /mtestout45 -mapper test-map
-numReduceTasks 1 -reducer test-reduce -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat

packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
/tmp/streamjob8218197708173702571.jar tmpDir=null
11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 23 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
... 28 more


Can anyone tell me what I am doing wrong?

Regards,
Ondrej

Search Discussions

  • Jean-Daniel Cryans at Feb 18, 2011 at 7:02 pm
    You have a typo, it's hbase.mapred.tablecolumns not hbase.mapred.tablecolumn

    J-D
    On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek wrote:
    Hello,

    I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs agains text files on
    hadoop, but I have a problem when I try to run the same job against hbase table.

    The table looks like this:
    hbase(main):015:0> scan 'table1'
    ROW                                                COLUMN+CELL

    row1                                              column=family1:a, timestamp=1298037737154,
    value=1

    row1                                              column=family1:b, timestamp=1298037744658,
    value=2

    row1                                              column=family1:c, timestamp=1298037748020,
    value=3

    row2                                              column=family1:a, timestamp=1298037755440,
    value=11

    row2                                              column=family1:b, timestamp=1298037758241,
    value=22

    row2                                              column=family1:c, timestamp=1298037761198,
    value=33

    row3                                              column=family1:a, timestamp=1298037767127,
    value=111

    row3                                              column=family1:b, timestamp=1298037770111,
    value=222

    row3                                              column=family1:c, timestamp=1298037774954,
    value=333

    3 row(s) in 0.0240 seconds


    And command I use, with the exception I get:

    # hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
    hbase.mapred.tablecolumn=family1:  -input table1 -output /mtestout45 -mapper test-map
    -numReduceTasks 1 -reducer test-reduce -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat

    packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
    /tmp/streamjob8218197708173702571.jar tmpDir=null
    11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
    hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
    Exception in thread "main" java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
    at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 23 more
    Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
    ... 28 more


    Can anyone tell me what I am doing wrong?

    Regards,
    Ondrej
  • Ondrej Holecek at Feb 19, 2011 at 1:03 pm
    Thank you, I've spend a lot of time with debuging but didn't notice this typo :(

    Now it works, but I don't understand one thing: On stdin I get this:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1}
    72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
    row2/family1:b/1298037758241/Put/vlen=2, row2/family1:c/1298037761198/Put/vlen=2}
    72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
    row3/family1:b/1298037770111/Put/vlen=3, row3/family1:c/1298037774954/Put/vlen=3}

    I see there is everything but value. What should I do to get value on stdin too?

    Ondrej
    On 02/18/11 20:01, Jean-Daniel Cryans wrote:
    You have a typo, it's hbase.mapred.tablecolumns not hbase.mapred.tablecolumn

    J-D
    On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek wrote:
    Hello,

    I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs agains text files on
    hadoop, but I have a problem when I try to run the same job against hbase table.

    The table looks like this:
    hbase(main):015:0> scan 'table1'
    ROW COLUMN+CELL

    row1 column=family1:a, timestamp=1298037737154,
    value=1

    row1 column=family1:b, timestamp=1298037744658,
    value=2

    row1 column=family1:c, timestamp=1298037748020,
    value=3

    row2 column=family1:a, timestamp=1298037755440,
    value=11

    row2 column=family1:b, timestamp=1298037758241,
    value=22

    row2 column=family1:c, timestamp=1298037761198,
    value=33

    row3 column=family1:a, timestamp=1298037767127,
    value=111

    row3 column=family1:b, timestamp=1298037770111,
    value=222

    row3 column=family1:c, timestamp=1298037774954,
    value=333

    3 row(s) in 0.0240 seconds


    And command I use, with the exception I get:

    # hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
    hbase.mapred.tablecolumn=family1: -input table1 -output /mtestout45 -mapper test-map
    -numReduceTasks 1 -reducer test-reduce -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat

    packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
    /tmp/streamjob8218197708173702571.jar tmpDir=null
    11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
    hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
    Exception in thread "main" java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
    at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 23 more
    Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
    ... 28 more


    Can anyone tell me what I am doing wrong?

    Regards,
    Ondrej
  • ShengChang Gu at Feb 19, 2011 at 3:42 pm
    By default, the prefix of a line
    up to the first tab character is the key and the rest of the line (excluding
    the tab character)
    will be the value. If there is no tab character in the line, then entire
    line is considered as key
    and the value is null. However, this can be customized, Use:

    -D stream.map.output.field.separator=.
    -D stream.num.map.output.key.fields=4

    2011/2/19 Ondrej Holecek <ondrej@holecek.eu>
    Thank you, I've spend a lot of time with debuging but didn't notice this
    typo :(

    Now it works, but I don't understand one thing: On stdin I get this:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1,
    row1/family1:c/1298037748020/Put/vlen=1}
    72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
    row2/family1:b/1298037758241/Put/vlen=2,
    row2/family1:c/1298037761198/Put/vlen=2}
    72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
    row3/family1:b/1298037770111/Put/vlen=3,
    row3/family1:c/1298037774954/Put/vlen=3}

    I see there is everything but value. What should I do to get value on stdin
    too?

    Ondrej
    On 02/18/11 20:01, Jean-Daniel Cryans wrote:
    You have a typo, it's hbase.mapred.tablecolumns not
    hbase.mapred.tablecolumn
    J-D
    On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek wrote:
    Hello,

    I'm testing hadoop and hbase, I can run mapreduce streaming or pipes
    jobs agains text files on
    hadoop, but I have a problem when I try to run the same job against
    hbase table.
    The table looks like this:
    hbase(main):015:0> scan 'table1'
    ROW COLUMN+CELL

    row1 column=family1:a,
    timestamp=1298037737154,
    value=1

    row1 column=family1:b,
    timestamp=1298037744658,
    value=2

    row1 column=family1:c,
    timestamp=1298037748020,
    value=3

    row2 column=family1:a,
    timestamp=1298037755440,
    value=11

    row2 column=family1:b,
    timestamp=1298037758241,
    value=22

    row2 column=family1:c,
    timestamp=1298037761198,
    value=33

    row3 column=family1:a,
    timestamp=1298037767127,
    value=111

    row3 column=family1:b,
    timestamp=1298037770111,
    value=222

    row3 column=family1:c,
    timestamp=1298037774954,
    value=333

    3 row(s) in 0.0240 seconds


    And command I use, with the exception I get:

    # hadoop jar
    /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
    hbase.mapred.tablecolumn=family1: -input table1 -output /mtestout45
    -mapper test-map
    -numReduceTasks 1 -reducer test-reduce -inputformat
    org.apache.hadoop.hbase.mapred.TableInputFormat
    packageJobJar:
    [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
    /tmp/streamjob8218197708173702571.jar tmpDir=null
    11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
    hdfs://
    oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
    Exception in thread "main" java.lang.RuntimeException: Error in
    configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
    at
    org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
    at
    org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
    at
    org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
    at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 23 more
    Caused by: java.lang.NullPointerException
    at
    org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
    ... 28 more


    Can anyone tell me what I am doing wrong?

    Regards,
    Ondrej

    --
    阿昌
  • Ondrej Holecek at Feb 19, 2011 at 4:04 pm
    I don't think you understand me correctly,

    I get this line:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1}

    I know "72 6f 77 31" is the key and the rest is value, let's call it
    mapreduce-value. In this mapreduce-value there is
    "row1/family1:a/1298037737154/Put/vlen=1" that is hbase-row name, hbase-column
    name and hbase-timestamp. But I expect also hbase-value.

    So my question is what to do to make TableInputFormat to send also this hbase-value.


    Ondrej

    On 02/19/11 16:41, ShengChang Gu wrote:
    By default, the prefix of a line
    up to the first tab character is the key and the rest of the line
    (excluding the tab character)
    will be the value. If there is no tab character in the line, then entire
    line is considered as key
    and the value is null. However, this can be customized, Use:

    -D stream.map.output.field.separator=.
    -D stream.num.map.output.key.fields=4

    2011/2/19 Ondrej Holecek <ondrej@holecek.eu
    Thank you, I've spend a lot of time with debuging but didn't notice
    this typo :(

    Now it works, but I don't understand one thing: On stdin I get this:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1,
    row1/family1:c/1298037748020/Put/vlen=1}
    72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
    row2/family1:b/1298037758241/Put/vlen=2,
    row2/family1:c/1298037761198/Put/vlen=2}
    72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
    row3/family1:b/1298037770111/Put/vlen=3,
    row3/family1:c/1298037774954/Put/vlen=3}

    I see there is everything but value. What should I do to get value
    on stdin too?

    Ondrej
    On 02/18/11 20:01, Jean-Daniel Cryans wrote:
    You have a typo, it's hbase.mapred.tablecolumns not
    hbase.mapred.tablecolumn
    J-D
    On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek wrote:
    Hello,

    I'm testing hadoop and hbase, I can run mapreduce streaming or
    pipes jobs agains text files on
    hadoop, but I have a problem when I try to run the same job
    against hbase table.
    The table looks like this:
    hbase(main):015:0> scan 'table1'
    ROW COLUMN+CELL

    row1
    column=family1:a, timestamp=1298037737154,
    value=1

    row1
    column=family1:b, timestamp=1298037744658,
    value=2

    row1
    column=family1:c, timestamp=1298037748020,
    value=3

    row2
    column=family1:a, timestamp=1298037755440,
    value=11

    row2
    column=family1:b, timestamp=1298037758241,
    value=22

    row2
    column=family1:c, timestamp=1298037761198,
    value=33

    row3
    column=family1:a, timestamp=1298037767127,
    value=111

    row3
    column=family1:b, timestamp=1298037770111,
    value=222

    row3
    column=family1:c, timestamp=1298037774954,
    value=333

    3 row(s) in 0.0240 seconds


    And command I use, with the exception I get:

    # hadoop jar
    /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
    hbase.mapred.tablecolumn=family1: -input table1 -output
    /mtestout45 -mapper test-map
    -numReduceTasks 1 -reducer test-reduce -inputformat
    org.apache.hadoop.hbase.mapred.TableInputFormat
    packageJobJar:
    [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
    /tmp/streamjob8218197708173702571.jar tmpDir=null
    11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
    hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
    <http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035>
    Exception in thread "main" java.lang.RuntimeException: Error in
    configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
    at
    org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
    at
    org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at
    org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
    at
    org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
    at
    org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
    at
    org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 23 more
    Caused by: java.lang.NullPointerException
    at
    org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
    ... 28 more


    Can anyone tell me what I am doing wrong?

    Regards,
    Ondrej



    --
    阿昌
  • Jean-Daniel Cryans at Feb 22, 2011 at 8:07 pm
    (moving to the hbase user ML)

    I think streaming used to work correctly in hbase 0.19 since the
    RowResult class was giving the value (which you had to parse out), but
    now that Result is made of KeyValue and they don't include the values
    in toString then I don't see how TableInputFormat could be used. You
    could write your own InputFormat that wraps around TIF that returns a
    specific format for each cell tho.

    Hope that somehow helps,

    J-D

    2011/2/19 Ondrej Holecek <ondrej@holecek.eu>:
    I don't think you understand me correctly,

    I get this line:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1}

    I know "72 6f 77 31" is the key and the rest is value, let's call it
    mapreduce-value. In this mapreduce-value there is
    "row1/family1:a/1298037737154/Put/vlen=1" that is hbase-row name, hbase-column
    name and hbase-timestamp. But I expect also hbase-value.

    So my question is what to do to make TableInputFormat to send also this hbase-value.


    Ondrej

    On 02/19/11 16:41, ShengChang Gu wrote:
    By default, the prefix of a line
    up to the first tab character is the key and the rest of the line
    (excluding the tab character)
    will be the value. If there is no tab character in the line, then entire
    line is considered as key
    and the value is null. However, this can be customized, Use:

    -D stream.map.output.field.separator=.
    -D stream.num.map.output.key.fields=4

    2011/2/19 Ondrej Holecek <ondrej@holecek.eu >
    Thank you, I've spend a lot of time with debuging but didn't notice
    this typo :(

    Now it works, but I don't understand one thing: On stdin I get this:

    72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
    row1/family1:b/1298037744658/Put/vlen=1,
    row1/family1:c/1298037748020/Put/vlen=1}
    72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
    row2/family1:b/1298037758241/Put/vlen=2,
    row2/family1:c/1298037761198/Put/vlen=2}
    72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
    row3/family1:b/1298037770111/Put/vlen=3,
    row3/family1:c/1298037774954/Put/vlen=3}

    I see there is everything but value. What should I do to get value
    on stdin too?

    Ondrej
    On 02/18/11 20:01, Jean-Daniel Cryans wrote:
    You have a typo, it's hbase.mapred.tablecolumns not
    hbase.mapred.tablecolumn
    J-D

    On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek <ondrej@holecek.eu
    wrote:
    Hello,

    I'm testing hadoop and hbase, I can run mapreduce streaming or
    pipes jobs agains text files on
    hadoop, but I have a problem when I try to run the same job
    against hbase table.
    The table looks like this:
    hbase(main):015:0> scan 'table1'
    ROW COLUMN+CELL

    row1
    column=family1:a, timestamp=1298037737154,
    value=1

    row1
    column=family1:b, timestamp=1298037744658,
    value=2

    row1
    column=family1:c, timestamp=1298037748020,
    value=3

    row2
    column=family1:a, timestamp=1298037755440,
    value=11

    row2
    column=family1:b, timestamp=1298037758241,
    value=22

    row2
    column=family1:c, timestamp=1298037761198,
    value=33

    row3
    column=family1:a, timestamp=1298037767127,
    value=111

    row3
    column=family1:b, timestamp=1298037770111,
    value=222

    row3
    column=family1:c, timestamp=1298037774954,
    value=333

    3 row(s) in 0.0240 seconds


    And command I use, with the exception I get:

    # hadoop jar
    /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
    hbase.mapred.tablecolumn=family1: -input table1 -output
    /mtestout45 -mapper test-map
    -numReduceTasks 1 -reducer test-reduce -inputformat
    org.apache.hadoop.hbase.mapred.TableInputFormat
    packageJobJar:
    [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
    /tmp/streamjob8218197708173702571.jar tmpDir=null
    11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
    hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
    <http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035>
    Exception in thread "main" java.lang.RuntimeException: Error in
    configuring object
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
    at
    org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
    at
    org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at
    org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
    at
    org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
    at
    org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
    at
    org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
    org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 23 more
    Caused by: java.lang.NullPointerException
    at
    org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
    ... 28 more


    Can anyone tell me what I am doing wrong?

    Regards,
    Ondrej



    --
    阿昌

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedFeb 18, '11 at 2:06p
activeFeb 22, '11 at 8:07p
posts6
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase