[ https://issues.apache.org/jira/browse/HADOOP-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476040 ]

Doug Cutting commented on HADOOP-928:

Sorry, this patch does not apply cleanly to current trunk. Other patches were recently committed that conflict with it.

Do FSInputChecker and FSOutputSummer need to be public? Aren't they only used by ChecksumFileSystem? Eventually we might want to expose these publicly, if, e.g., HDFS's eventual built-in checksum implementation shares code with them, but we also might not, since that mechanism might be independent. So, for now, we should probably keep these package-private or even priavate within ChecksumFileSystem, the only place they're used.

In ChecksumFileSystem#create(Path, int bufferSize) it looks like two buffers of bufferSize are created. I think only the inner buffer, created by the underlying raw filesystem, should be that size, which can be quite large, while the outer buffer should be quite small, no larger than bytesPerSum.

Similarly, in ChecksumFileSystem#open(Path, int bufferSize), I think the inner buffer should be large, to minimize seeks, system calls, etc., while the outer buffer should be quite small, no larger than bytesPerSum.
make checksums optional per FileSystem

Key: HADOOP-928
URL: https://issues.apache.org/jira/browse/HADOOP-928
Project: Hadoop
Issue Type: Improvement
Components: fs
Reporter: Doug Cutting
Assigned To: Hairong Kuang
Attachments: checksum.patch, checksum1.patch

Checksumming is currently built into the base FileSystem class. It should instead be optional, with each FileSystem implementation electing whether to use the Hadoop-provided checksum system, or to disable it, or to implement its own custom checksum system.
To implement this, a ChecksumFileSystem implementation can be provided that wraps another FileSystem implementation, implementing checksums as in Hadoop's current mandatory implementation (i.e., as a separate crc file per file that's elided from directory listings). The 'raw' FileSystem methods would be removed. FSDataInputStream and FSDataOutputStream would be made interfaces.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 18 of 29 | next ›
Discussion Overview
groupcommon-dev @
postedJan 25, '07 at 4:19a
activeMay 17, '07 at 11:28a

1 user in discussion

Hadoop QA (JIRA): 29 posts



site design / logo © 2022 Grokbase