FAQ
Enable compression in HBase Export
----------------------------------

Key: HBASE-2225
URL: https://issues.apache.org/jira/browse/HBASE-2225
Project: Hadoop HBase
Issue Type: Improvement
Components: util
Affects Versions: 0.20.1
Environment: OS agnostic
Reporter: Ted Yu
Priority: Minor


org.apache.hadoop.hbase.mapreduce.Export should set compression codec

In createSubmittableJob(), the following should be added:
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
From my experiment, 10% to 50% reduction in Export output has been observed.
SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • stack (JIRA) at Feb 13, 2010 at 6:00 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833421#action_12833421 ]

    stack commented on HBASE-2225:
    ------------------------------

    I think this should be an option. How about adding it as a command-line flag or something to the export job?
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 13, 2010 at 6:46 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833434#action_12833434 ]

    Lars George commented on HBASE-2225:
    ------------------------------------

    +1

    I have done the same in the past and can confirm the implicit support by the InputFormat. I added it like this

    {code}
    // set output stream compression
    if (params.get(CONF_COMPRESS) != null) {
    job.set("mapred.output.compress", "true");
    job.set("mapred.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec");
    }
    {code}

    where CONF_COMPRESS is a simple command line switch. This is mapred code so Ted's code is more current and can be used as is.

    Ted, you want to make a patch? If not I can add it as well. Let me know.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 13, 2010 at 9:00 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833453#action_12833453 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    Using command line switch is fine.
    I think we can make this feature more versatile by naming the switch no_compression_export. Meaning by default, GzipCodec is used for Export.

    We detect compression mode of the table first. If the table is compressed, we don't apply GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified.

    Since SequenceFileInputFormat is able to handle GzipCodec, this won't cause regression for the Import class.

    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 13, 2010 at 9:40 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833458#action_12833458 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    Little more detail: we iterate through HColumnDescriptor's of the table (HTableDescriptor.getFamilies()). If all column families are compressed, we don't use GzipCodec for Export.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 14, 2010 at 2:59 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833576#action_12833576 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    Here is the code I use:
    // determine if GzipCodec should be used
    HBaseAdmin admin = new HBaseAdmin((HBaseConfiguration)conf);
    HTableDescriptor tableDesc = admin.getTableDescriptor(tableName.getBytes());
    Collection<HColumnDescriptor> families = tableDesc.getFamilies();
    boolean compressed = true;
    for (HColumnDescriptor col : families)
    {
    Compression.Algorithm algo = col.getCompressionType();
    if (algo == Compression.Algorithm.NONE)
    {
    compressed = false;
    }
    }
    if (!compressed)
    {
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    }

    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Andrew Purtell (JIRA) at Feb 14, 2010 at 6:13 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833602#action_12833602 ]

    Andrew Purtell commented on HBASE-2225:
    ---------------------------------------

    Need a test for LZO also then?
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 14, 2010 at 8:10 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833617#action_12833617 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    If LZO compression is 10% bigger than that of GZ, it may be fine not to compress again with GZ for export.
    I think command line switch comes into play when table is LZO compressed - it's up to the user of Export to decide.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • stack (JIRA) at Feb 15, 2010 at 5:01 am
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833678#action_12833678 ]

    stack commented on HBASE-2225:
    ------------------------------

    .bq We detect compression mode of the table first. If the table is compressed, we don't apply GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified.

    Isn't the fact that the table is compressed orthogonal to whether or not the export should be compressed?

    I'd say, no compression should be the default. Thats how its been working up to this.

    I'm good with the compression being gzip since as Lars and Ted say, its native to sequencefiles.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 15, 2010 at 5:35 am
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833685#action_12833685 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    Let's implement using my initial suggestion which Lars and Stack concurred.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 17, 2010 at 8:10 am
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834680#action_12834680 ]

    Lars George commented on HBASE-2225:
    ------------------------------------

    Agreed. I think what Ted meant (and Andrew also touched) is that if a table has compression enabled then it would make sense to use it for backups too. So that small tables for example are stored as is. Ted, since the backup reads the KeyValue records compression is not part of the equation anymore, i.e. the MapReduce job doing the backup does not know if the table was compressed or not. I'll implement the command line switch and attach a patch today.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 17, 2010 at 1:41 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars George reassigned HBASE-2225:
    ----------------------------------

    Assignee: Lars George
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Assignee: Lars George
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 17, 2010 at 1:46 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Lars George updated HBASE-2225:
    -------------------------------

    Attachment: HBASE-2225-trunk.patch

    Patch adds command line switch to enable compression as well as fixing a small typo in info line logged at startup.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 17, 2010 at 7:44 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834958#action_12834958 ]

    Lars George commented on HBASE-2225:
    ------------------------------------

    Trying to test this now and getting

    {code}
    java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code!
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:347)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:420)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:60)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:71)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:619)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:257)
    {code}

    Is that a change in Hadoop? This used to work before I am sure. Comments? I will look into it.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Assignee: Lars George
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Lars George (JIRA) at Feb 17, 2010 at 8:00 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834967#action_12834967 ]

    Lars George commented on HBASE-2225:
    ------------------------------------

    Looks like that is the same check in Hadoop 0.20.1, so it must be due to my local setup. Ted, care to test the change? Otherwise I tested it with various command line arguments and it works as expected.
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Assignee: Lars George
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 17, 2010 at 9:55 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835013#action_12835013 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    We use hadoop-0.20.1 and hbase-0.20.1
    Would this combination count ?






    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Assignee: Lars George
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Ted Yu (JIRA) at Feb 25, 2010 at 7:34 pm
    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838494#action_12838494 ]

    Ted Yu commented on HBASE-2225:
    -------------------------------

    Would it be useful if we add the ability to filter records using selected row key values ?

    Thanks
    Enable compression in HBase Export
    ----------------------------------

    Key: HBASE-2225
    URL: https://issues.apache.org/jira/browse/HBASE-2225
    Project: Hadoop HBase
    Issue Type: Improvement
    Components: util
    Affects Versions: 0.20.1
    Environment: OS agnostic
    Reporter: Ted Yu
    Assignee: Lars George
    Priority: Minor
    Attachments: HBASE-2225-trunk.patch

    Original Estimate: 0.5h
    Remaining Estimate: 0.5h

    org.apache.hadoop.hbase.mapreduce.Export should set compression codec
    In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
    From my experiment, 10% to 50% reduction in Export output has been observed.
    SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedFeb 13, '10 at 3:46p
activeFeb 25, '10 at 7:34p
posts17
users1
websitehbase.apache.org

1 user in discussion

Ted Yu (JIRA): 17 posts

People

Translate

site design / logo © 2022 Grokbase