Hairong Kuang commented on HADOOP-928:
--------------------------------------
Yes, all you said makes sense. But I'd like to make two changes to your original proposal:
1. keep all the raw methods in the FileSystem interface.
2. Remove Summer from FSDataOutputStream and make it an interface; remove Checker from FSDataInputStream and make it an interface. So DistributedFileSystem can have its own checksum implementation.
So FileSystem, ChecksumFileSystem, DistributedFileSystem/LocalFileSystem would be like the following:
1. FileSystem
For all the raw methods:
* openRaw returns an FSInputStream specific to each file system
* createRaw returns an FSOutputStream specific to each file system
* deleteRaw delete the given file
and etc.
For all the non-raw methods,
* open return a buffered, position-cached dataInputStream;
* create returns a buffered, position cached DataOutputStream;
* delete is the same as deleteRaw;
and etc.
2. ChecksumFileSystem
* open returns a buffered, position-cached, checksumed DataInputStream;
* create returns a buffered, position-cached, checksumed dataOutputStream;
* delete removes the given file and its checksum file
and etc.
3. DistributedFileSystem and LocalFileSystem extend ChecksumFileSystem and implement all the raw methods.
make checksums optional per FileSystem
--------------------------------------
Key: HADOOP-928
URL: https://issues.apache.org/jira/browse/HADOOP-928
Project: Hadoop
Issue Type: Improvement
Components: fs
Reporter: Doug Cutting
Assigned To: Hairong Kuang
Checksumming is currently built into the base FileSystem class. It should instead be optional, with each FileSystem implementation electing whether to use the Hadoop-provided checksum system, or to disable it, or to implement its own custom checksum system.
To implement this, a ChecksumFileSystem implementation can be provided that wraps another FileSystem implementation, implementing checksums as in Hadoop's current mandatory implementation (i.e., as a separate crc file per file that's elided from directory listings). The 'raw' FileSystem methods would be removed. FSDataInputStream and FSDataOutputStream would be made interfaces.
----------------------------------------
Key: HADOOP-928
URL: https://issues.apache.org/jira/browse/HADOOP-928
Project: Hadoop
Issue Type: Improvement
Components: fs
Reporter: Doug Cutting
Assigned To: Hairong Kuang
Checksumming is currently built into the base FileSystem class. It should instead be optional, with each FileSystem implementation electing whether to use the Hadoop-provided checksum system, or to disable it, or to implement its own custom checksum system.
To implement this, a ChecksumFileSystem implementation can be provided that wraps another FileSystem implementation, implementing checksums as in Hadoop's current mandatory implementation (i.e., as a separate crc file per file that's elided from directory listings). The 'raw' FileSystem methods would be removed. FSDataInputStream and FSDataOutputStream would be made interfaces.
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.