FAQ
Hi,

I use the org.apache.hadoop.io.Text object to set its value "測試" in
chinese text(six bytes in UTF-8 encoding), and when I invoke its
"getBytes()" method that return the raw bytes (11 bytes), but it's actually
only six bytes. I knew that object involves the "getLength()" method.
My question is that the "getBytes()" method why not return actually bytes?

Why should not be:
public byte[] getBytes() {
//return bytes
return Arrays.copyOf(bytes,getLength());
}

thanks in advance

Shen

Search Discussions

  • Todd Lipcon at Oct 31, 2009 at 5:25 pm

    On Sat, Oct 31, 2009 at 8:02 AM, ChingShen wrote:

    Hi,

    I use the org.apache.hadoop.io.Text object to set its value "測試" in
    chinese text(six bytes in UTF-8 encoding), and when I invoke its
    "getBytes()" method that return the raw bytes (11 bytes), but it's actually
    only six bytes. I knew that object involves the "getLength()" method.
    My question is that the "getBytes()" method why not return actually bytes?

    Why should not be:
    public byte[] getBytes() {
    //return bytes
    return Arrays.copyOf(bytes,getLength());
    }
    Simply to avoid the extra copy which slows down performance needlessly for a
    lot of use cases. You're always free to do this yourself.

    -Todd

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 31, '09 at 3:03p
activeOct 31, '09 at 5:25p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Todd Lipcon: 1 post ChingShen: 1 post

People

Translate

site design / logo © 2021 Grokbase