Grokbase Groups HBase dev March 2010
Optimize M-R by bulk excluding regions - less InputSplit-s to avoid traffic on region servers when performing M-R on a subset of the schema

Key: HBASE-2302
Project: Hadoop HBase
Issue Type: Improvement
Reporter: Kay Kay

TableInputFormatBase , creates a InputSplit per region. Given that the keys are sorted - sometimes - it might be needed to perform M-R on a subset of the keyset ( regions ) . Adding a provision to filter the regions when generating InputSplits might be useful .

The granularity of exclusion is per-region-wise. A RowFilter might still be needed during a Scan on a separate region, but that is a separate issue altogether.

Methodology: Add a way to prune the keyset before generating inputsplits , by default - all sets are returned indicating all regions are included that could be overridden as necessary, depending on the higher-level logic.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedMar 9, '10 at 3:54a
activeMar 9, '10 at 3:54a

1 user in discussion

Kay Kay (JIRA): 1 post



site design / logo © 2022 Grokbase