FAQ
We use hadoop/hdfs to archive data. I archive a lot of file by creating one
large tar file and then placing to hdfs. Is it better to use hadoop archive
for this or is it essentially the same thing?

--
--- Get your facts first, then you can distort them as you please.--

Search Discussions

  • Joey Echeverria at Jun 27, 2011 at 2:10 pm
    The advantage of a hadoop archive files is it lets you access the
    files stored in it directly. For example, if you archived three files
    (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
    of the three files using the hadoop command line:

    hadoop fs -cat har:///user/joey/out/foo.har/a.txt

    You can also copy files out of the archive or use files in the archive
    as input to map reduce jobs.

    -Joey
    On Mon, Jun 27, 2011 at 3:06 AM, Rita wrote:
    We use hadoop/hdfs to archive data. I archive a lot of file by creating one
    large tar file and then placing to hdfs. Is it better to use hadoop archive
    for this or is it essentially the same thing?

    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Rita at Jun 27, 2011 at 11:36 pm
    So, it does an index of the file?


    On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria wrote:

    The advantage of a hadoop archive files is it lets you access the
    files stored in it directly. For example, if you archived three files
    (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
    of the three files using the hadoop command line:

    hadoop fs -cat har:///user/joey/out/foo.har/a.txt

    You can also copy files out of the archive or use files in the archive
    as input to map reduce jobs.

    -Joey
    On Mon, Jun 27, 2011 at 3:06 AM, Rita wrote:
    We use hadoop/hdfs to archive data. I archive a lot of file by creating one
    large tar file and then placing to hdfs. Is it better to use hadoop archive
    for this or is it essentially the same thing?

    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434


    --
    --- Get your facts first, then you can distort them as you please.--
  • Joey Echeverria at Jun 27, 2011 at 11:47 pm
    Yes, you can see a picture describing HAR files in this old blog post:

    http://www.cloudera.com/blog/2009/02/the-small-files-problem/

    -Joey
    On Mon, Jun 27, 2011 at 4:36 PM, Rita wrote:
    So, it does an index of the file?


    On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria wrote:

    The advantage of a hadoop archive files is it lets you access the
    files stored in it directly. For example, if you archived three files
    (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
    of the three files using the hadoop command line:

    hadoop fs -cat har:///user/joey/out/foo.har/a.txt

    You can also copy files out of the archive or use files in the archive
    as input to map reduce jobs.

    -Joey
    On Mon, Jun 27, 2011 at 3:06 AM, Rita wrote:
    We use hadoop/hdfs to archive data. I archive a lot of file by creating one
    large tar file and then placing to hdfs. Is it better to use hadoop archive
    for this or is it essentially the same thing?

    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Manhee Jo at Jul 7, 2011 at 1:52 am
    do you know how to set the number of map/reduce tasks rather than 1 during
    hadoop archiving?
    i've tried -Dmapred.map.tasks=2 (we are using 0.19.2 actually :( ) but in
    vain.

    thanks,
    manhee

    ----- Original Message -----
    From: "Joey Echeverria" <[email protected]>
    To: <[email protected]>
    Sent: Tuesday, June 28, 2011 8:46 AM
    Subject: Re: tar or hadoop archive

    Yes, you can see a picture describing HAR files in this old blog post:

    http://www.cloudera.com/blog/2009/02/the-small-files-problem/

    -Joey
    On Mon, Jun 27, 2011 at 4:36 PM, Rita wrote:
    So, it does an index of the file?



    On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria <[email protected]>
    wrote:
    The advantage of a hadoop archive files is it lets you access the
    files stored in it directly. For example, if you archived three files
    (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
    of the three files using the hadoop command line:

    hadoop fs -cat har:///user/joey/out/foo.har/a.txt

    You can also copy files out of the archive or use files in the archive
    as input to map reduce jobs.

    -Joey
    On Mon, Jun 27, 2011 at 3:06 AM, Rita wrote:
    We use hadoop/hdfs to archive data. I archive a lot of file by
    creating one
    large tar file and then placing to hdfs. Is it better to use hadoop archive
    for this or is it essentially the same thing?

    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 27, '11 at 10:07a
activeJul 7, '11 at 1:52a
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase