FAQ
Hi,
I plan to streaming logs data HDFS using many writer, each writer write a
stream of data to a HDFS file (may rotate)

I wonder how many concurrent writer i should use?
And if you have that experience please share to me : hadoop cluster size,
number of writer, replication.

Thanks.
Tien

Search Discussions

  • Yanbo Liang at Aug 6, 2012 at 6:15 am
    You can use scribe or flume to collect log data and integrated with hadoop.

    2012/8/4 Nguyen Manh Tien <tien.nguyenmanh@gmail.com>
    Hi,
    I plan to streaming logs data HDFS using many writer, each writer write a
    stream of data to a HDFS file (may rotate)

    I wonder how many concurrent writer i should use?
    And if you have that experience please share to me : hadoop cluster size,
    number of writer, replication.

    Thanks.
    Tien
  • Alex Baranau at Aug 6, 2012 at 2:59 pm
    Also interested in this question.

    @Yanbo: while we could use third-party tools to import/gather data into
    HDFS, I guess here is the intention to write data to HDFS directly. It
    would be great to hear what are the "sensible" limitations on number of
    files one can write to at the same time.

    Thank you in advance,

    Alex Baranau
    ------
    Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
    On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang wrote:

    You can use scribe or flume to collect log data and integrated with hadoop.


    2012/8/4 Nguyen Manh Tien <tien.nguyenmanh@gmail.com>
    Hi,
    I plan to streaming logs data HDFS using many writer, each writer write a
    stream of data to a HDFS file (may rotate)

    I wonder how many concurrent writer i should use?
    And if you have that experience please share to me : hadoop cluster size,
    number of writer, replication.

    Thanks.
    Tien

    --
    Alex Baranau
    ------
    Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
    Solr
  • Nguyen Manh Tien at Aug 7, 2012 at 1:36 am
    @Yanbo, Alex: I want to dev a custom module to write directly to HDFS.
    Collector in flume aggregate log from many source and write into few file.
    So if i want to write to many file (for example one for each source), i
    want to know how many file we can open in that case.

    Thanks.
    Tien
    On Mon, Aug 6, 2012 at 9:58 PM, Alex Baranau wrote:

    Also interested in this question.

    @Yanbo: while we could use third-party tools to import/gather data into
    HDFS, I guess here is the intention to write data to HDFS directly. It
    would be great to hear what are the "sensible" limitations on number of
    files one can write to at the same time.

    Thank you in advance,

    Alex Baranau
    ------
    Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
    On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang wrote:

    You can use scribe or flume to collect log data and integrated with
    hadoop.


    2012/8/4 Nguyen Manh Tien <tien.nguyenmanh@gmail.com>
    Hi,
    I plan to streaming logs data HDFS using many writer, each writer write
    a stream of data to a HDFS file (may rotate)

    I wonder how many concurrent writer i should use?
    And if you have that experience please share to me : hadoop cluster
    size, number of writer, replication.

    Thanks.
    Tien

    --
    Alex Baranau
    ------
    Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
    Solr
  • Yanbo Liang at Aug 7, 2012 at 6:53 am
    I think there is no distinct limitation at the number of files one can
    write to at the same time. Because each write stream is out to
    corresponding DataNodes which are different most likely. So it's like the
    MapReduce output directly stored as seperate file in HDFS which is no
    distinct limitation at the number of files concurrently write.

    2012/8/7 Nguyen Manh Tien <tien.nguyenmanh@gmail.com>
    @Yanbo, Alex: I want to dev a custom module to write directly to HDFS.
    Collector in flume aggregate log from many source and write into few file.
    So if i want to write to many file (for example one for each source), i
    want to know how many file we can open in that case.

    Thanks.
    Tien

    On Mon, Aug 6, 2012 at 9:58 PM, Alex Baranau wrote:

    Also interested in this question.

    @Yanbo: while we could use third-party tools to import/gather data into
    HDFS, I guess here is the intention to write data to HDFS directly. It
    would be great to hear what are the "sensible" limitations on number of
    files one can write to at the same time.

    Thank you in advance,

    Alex Baranau
    ------
    Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
    On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang wrote:

    You can use scribe or flume to collect log data and integrated with
    hadoop.


    2012/8/4 Nguyen Manh Tien <tien.nguyenmanh@gmail.com>
    Hi,
    I plan to streaming logs data HDFS using many writer, each writer write
    a stream of data to a HDFS file (may rotate)

    I wonder how many concurrent writer i should use?
    And if you have that experience please share to me : hadoop cluster
    size, number of writer, replication.

    Thanks.
    Tien

    --
    Alex Baranau
    ------
    Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
    - Solr
  • Nguyen Manh Tien at Aug 8, 2012 at 2:14 pm
    You are correct.
    I think the the botleneck maybe in namenode when there are too many small
    file, HDFS is for big file, not for so many small file.
  • Alo alt at Aug 9, 2012 at 6:52 am
    With Flume you could use batch mode. Flume will wait until the count of
    events are delivered (let's say 100), and then bulk write them into HDFS
    (as example). On top you could set a timeout, means, if in sec=x you not
    hit batch=x write out. That are usefull for very small files (Avro
    maybe), and will decrease the NN stress.

    cheers,
    Alex

    Nguyen Manh Tien wrote:
    You are correct.
    I think the the botleneck maybe in namenode when there are too many
    small file, HDFS is for big file, not for so many small file.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedAug 3, '12 at 5:34p
activeAug 9, '12 at 6:52a
posts7
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase