Can't read binary data off HDFS

Key: HDFS-1169
URL: https://issues.apache.org/jira/browse/HDFS-1169
Project: Hadoop HDFS
Issue Type: Bug
Components: contrib/thriftfs
Affects Versions: 0.20.2
Reporter: Erik Forsberg

Trying to access binary data stored in HDFS (in my case, TypedByte files generated by Dumbo) via thrift talking to org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is mangled. For example, when I read a file which contains the value 0xa2, it's coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement character.

I think this is because the read method in HadoopThriftServer.java is trying to convert the data read from HDFS into UTF-8 via the String() constructor.

This essentially makes the HDFS thrift API useless for me :-(.

Not being an expert on Thrift, but would it be possible to modify the API so that it uses the binary type listed on http://wiki.apache.org/thrift/ThriftTypes?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
postedMay 21, '10 at 9:45a
activeMay 21, '10 at 9:45a

1 user in discussion

Erik Forsberg (JIRA): 1 post



site design / logo © 2022 Grokbase