FAQ
[ https://issues.apache.org/jira/browse/HADOOP-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468453 ]

Hairong Kuang commented on HADOOP-928:
--------------------------------------

Yes, all you said makes sense. But I'd like to make two changes to your original proposal:
1. keep all the raw methods in the FileSystem interface.
2. Remove Summer from FSDataOutputStream and make it an interface; remove Checker from FSDataInputStream and make it an interface. So DistributedFileSystem can have its own checksum implementation.

So FileSystem, ChecksumFileSystem, DistributedFileSystem/LocalFileSystem would be like the following:
1. FileSystem
For all the raw methods:
* openRaw returns an FSInputStream specific to each file system
* createRaw returns an FSOutputStream specific to each file system
* deleteRaw delete the given file
and etc.
For all the non-raw methods,
* open return a buffered, position-cached dataInputStream;
* create returns a buffered, position cached DataOutputStream;
* delete is the same as deleteRaw;
and etc.

2. ChecksumFileSystem
* open returns a buffered, position-cached, checksumed DataInputStream;
* create returns a buffered, position-cached, checksumed dataOutputStream;
* delete removes the given file and its checksum file
and etc.

3. DistributedFileSystem and LocalFileSystem extend ChecksumFileSystem and implement all the raw methods.
make checksums optional per FileSystem
--------------------------------------

Key: HADOOP-928
URL: https://issues.apache.org/jira/browse/HADOOP-928
Project: Hadoop
Issue Type: Improvement
Components: fs
Reporter: Doug Cutting
Assigned To: Hairong Kuang

Checksumming is currently built into the base FileSystem class. It should instead be optional, with each FileSystem implementation electing whether to use the Hadoop-provided checksum system, or to disable it, or to implement its own custom checksum system.
To implement this, a ChecksumFileSystem implementation can be provided that wraps another FileSystem implementation, implementing checksums as in Hadoop's current mandatory implementation (i.e., as a separate crc file per file that's elided from directory listings). The 'raw' FileSystem methods would be removed. FSDataInputStream and FSDataOutputStream would be made interfaces.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 29 | next ›
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 25, '07 at 4:19a
activeMay 17, '07 at 11:28a
posts29
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hadoop QA (JIRA): 29 posts

People

Translate

site design / logo © 2023 Grokbase