FAQ
Hi,

I am writing an Hadoop application that uses HBase as both source and sink.

There is no reducer job in my application.

I am using TableOutputFormat as the OutputFormatClass.

I read it on the Internet that it is experimentally faster to directly
instantiate HTable and use HTable.batch() in the Map
than to use TableOutputFormat as the Map's OutputClass

So I looked into the source code,
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.
It looked like TableRecordWriter does not support batch updates, since
TableRecordWriter.write() called HTable.put(new Put()).

Am I right on this matter? Or does TableOutputFormat automatically do batch
updates somehow?
Or is there a specific way to do batch updates with TableOutputFormat?

Any explanation is greatly appreciated.

Ed

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 22, '11 at 2:23a
activeJun 22, '11 at 2:23a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Edward choi: 1 post

People

Translate

site design / logo © 2021 Grokbase