FAQ
Hi Devaraj, thanks for the reply and suggestion. I had a similar but less
sophisticated checkpoint mechanism.

Another question is, why the MapFile(in fact the SequenceFile) is made
immutable in the first place? I believe the motivation must make sense but
so far I don't know it.

On 6/27/07, Devaraj Das wrote:

No, you cannot append to a file on the dfs and your app should be able to
treat multiple files as one single logical file (as you point out). But in
your case, it seems like you could design your app to have some buffering,
for example, you could have a buffer for the n different files, and could
flush the buffer to different files on the dfs only when you have reached
a
certain limit on the amount of data in the buffer.
I am not sure whether fault handling is of concern to you but there is the
danger of losing the buffered messages if your app goes down. One way to
handle this - assuming you have the ability to reprocess messages, you
could
checkpoint the state of the message processor in the dfs - the state could
include the last message ID you flushed, and the next time your app starts
up, it reads the checkpoint file from the dfs, gets the ID, and process
messages starting from (ID + 1).

-----Original Message-----
From: Open Study
Sent: Tuesday, June 26, 2007 8:42 PM
To: hadoop-user@lucene.apache.org
Subject: is that possible to make MapFile "mutable" ?

Hi all,

MapFile doesn't support append mode of creation, so every time the
existing
mapfile would be overwritten if a new one with same name is created.

Is there anyway I can append to an MapFile or alike without erasing the
old
content? or it doesn't makes sense at all?

In my scenario I need to split mass (count by tens of millions) messages
according to certain rules and put them into different mapfiles, which are
supposed to get updated when new messages come in. Since I didn't find a
way
to make mapfile appendable, I have to create new mapfiles, so one mapfile
can contain as little as one message in worst case and I will have to
later
merge them with their proper siblings.

Regards

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 26, '07 at 3:12p
activeJun 27, '07 at 3:34a
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Open Study: 2 posts Devaraj Das: 2 posts

People

Translate

site design / logo © 2022 Grokbase