FAQ
I'm working on a cryptography assignment in which we are supposed to
analyze ciphertexts for letter frequency. However, I noticed that if you
iterate over a byte array and convert each byte to a string, many of the
bytes convert to two-byte strings.

I put together an example here: http://play.golang.org/p/xZbFTsBW00

Why is this happening? How could a single byte be converted to a two-byte
string?

--

Search Discussions

  • Chris dollin at Nov 16, 2012 at 9:55 am

    On 16 November 2012 09:17, Skyler Hawthorne wrote:
    I'm working on a cryptography assignment in which we are supposed to analyze
    ciphertexts for letter frequency. However, I noticed that if you iterate
    over a byte array and convert each byte to a string, many of the bytes
    convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a two-byte
    string?
    string(aninteger) doesn't convert how you think it would. It delivers
    a string whose bytes form the UTF-8 representation of the argument
    value, ie, it makes a one-rune string from a rune [== unicode codepoint].
    Such a string may be more than one byte long.

    http://golang.org/ref/spec#Conversions

    Chris

    --
    Chris "allusive" Dollin

    --
  • Dan Kortschak at Nov 16, 2012 at 10:30 am
    http://play.golang.org/p/4yvIy5MX0W demonstrates explicitly.
    On 16/11/2012, at 8:25 PM, "chris dollin" wrote:
    On 16 November 2012 09:17, Skyler Hawthorne wrote:
    I'm working on a cryptography assignment in which we are supposed to analyze
    ciphertexts for letter frequency. However, I noticed that if you iterate
    over a byte array and convert each byte to a string, many of the bytes
    convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a two-byte
    string?
    string(aninteger) doesn't convert how you think it would. It delivers
    a string whose bytes form the UTF-8 representation of the argument
    value, ie, it makes a one-rune string from a rune [== unicode codepoint].
    Such a string may be more than one byte long.

    http://golang.org/ref/spec#Conversions

    Chris
    --
  • Kyle Lemons at Nov 16, 2012 at 10:36 pm
    When you decode the ciphertext, you're getting a string of raw bytes. Many
    of these (if it's a good crypto algorithm) will be >127, which is not valid
    ASCII, and so its UTF-8 encoding will be used when its in a string (which
    is two bytes long). If you're looking for a histogram of the *bytes *in
    the ciphertext, just use a map[byte]int:


    On Fri, Nov 16, 2012 at 1:17 AM, Skyler Hawthorne wrote:

    I'm working on a cryptography assignment in which we are supposed to
    analyze ciphertexts for letter frequency. However, I noticed that if you
    iterate over a byte array and convert each byte to a string, many of the
    bytes convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a two-byte
    string?

    --

    --
  • Kyle Lemons at Nov 16, 2012 at 10:31 pm

    On Fri, Nov 16, 2012 at 2:30 PM, Kyle Lemons wrote:

    When you decode the ciphertext, you're getting a string of raw bytes.
    Many of these (if it's a good crypto algorithm) will be >127, which is not
    valid ASCII, and so its UTF-8 encoding will be used when its in a string
    (which is two bytes long). If you're looking for a histogram of the *
    bytes *in the ciphertext, just use a map[byte]int:
    apparently I keyed send instead of paste:
    http://play.golang.org/p/XpqFpETnz0

    On Fri, Nov 16, 2012 at 1:17 AM, Skyler Hawthorne <
    skylerhawthorne@gmail.com> wrote:
    I'm working on a cryptography assignment in which we are supposed to
    analyze ciphertexts for letter frequency. However, I noticed that if you
    iterate over a byte array and convert each byte to a string, many of the
    bytes convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a two-byte
    string?

    --

    --
  • Kyle Lemons at Nov 17, 2012 at 12:12 am
    You can use a [2]byte as a map key.

    On Fri, Nov 16, 2012 at 3:14 PM, Skyler wrote:

    Well, this is what I did at first, but after counting single characters, I
    want to count the digrams and trigrams, and I can't think of a sane way to
    do that besides just encapsulating them in a string.
    On Nov 16, 2012 2:30 PM, "Kyle Lemons" wrote:

    When you decode the ciphertext, you're getting a string of raw bytes.
    Many of these (if it's a good crypto algorithm) will be >127, which is not
    valid ASCII, and so its UTF-8 encoding will be used when its in a string
    (which is two bytes long). If you're looking for a histogram of the *
    bytes *in the ciphertext, just use a map[byte]int:



    On Fri, Nov 16, 2012 at 1:17 AM, Skyler Hawthorne <
    skylerhawthorne@gmail.com> wrote:
    I'm working on a cryptography assignment in which we are supposed to
    analyze ciphertexts for letter frequency. However, I noticed that if you
    iterate over a byte array and convert each byte to a string, many of the
    bytes convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a
    two-byte string?

    --

    --
  • Kyle Lemons at Nov 17, 2012 at 1:11 am
    Luckily, [2]byte is not a slice :)

    http://play.golang.org/p/uAITHlwGlN

    On Fri, Nov 16, 2012 at 5:03 PM, Skyler wrote:

    Slices aren't allowed as map keys
    On Nov 16, 2012 4:12 PM, "Kyle Lemons" wrote:

    You can use a [2]byte as a map key.

    On Fri, Nov 16, 2012 at 3:14 PM, Skyler wrote:

    Well, this is what I did at first, but after counting single characters,
    I want to count the digrams and trigrams, and I can't think of a sane way
    to do that besides just encapsulating them in a string.
    On Nov 16, 2012 2:30 PM, "Kyle Lemons" wrote:

    When you decode the ciphertext, you're getting a string of raw bytes.
    Many of these (if it's a good crypto algorithm) will be >127, which is not
    valid ASCII, and so its UTF-8 encoding will be used when its in a string
    (which is two bytes long). If you're looking for a histogram of the *
    bytes *in the ciphertext, just use a map[byte]int:



    On Fri, Nov 16, 2012 at 1:17 AM, Skyler Hawthorne <
    skylerhawthorne@gmail.com> wrote:
    I'm working on a cryptography assignment in which we are supposed to
    analyze ciphertexts for letter frequency. However, I noticed that if you
    iterate over a byte array and convert each byte to a string, many of the
    bytes convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a
    two-byte string?

    --

    --
  • Bryan Mills at Nov 18, 2012 at 9:29 pm
    Alternatively, if you want to remove the fixed length you can put the byte
    in a singleton []byte and then convert that to a string:
    http://play.golang.org/p/gsAGy8NFW8

    That allows you to distinguish between single-bytes and digraphs with zero
    as the second byte, at the cost of marginally higher runtime overhead for
    storing the lengths.
    On Friday, November 16, 2012 8:12:05 PM UTC-5, Kyle Lemons wrote:

    Luckily, [2]byte is not a slice :)

    http://play.golang.org/p/uAITHlwGlN


    On Fri, Nov 16, 2012 at 5:03 PM, Skyler <skylerh...@gmail.com<javascript:>
    wrote:
    Slices aren't allowed as map keys
    On Nov 16, 2012 4:12 PM, "Kyle Lemons" <kev...@google.com <javascript:>>
    wrote:
    You can use a [2]byte as a map key.


    On Fri, Nov 16, 2012 at 3:14 PM, Skyler <skylerh...@gmail.com<javascript:>
    wrote:
    Well, this is what I did at first, but after counting single
    characters, I want to count the digrams and trigrams, and I can't think of
    a sane way to do that besides just encapsulating them in a string.
    On Nov 16, 2012 2:30 PM, "Kyle Lemons" <kev...@google.com<javascript:>>
    wrote:
    When you decode the ciphertext, you're getting a string of raw bytes.
    Many of these (if it's a good crypto algorithm) will be >127, which is not
    valid ASCII, and so its UTF-8 encoding will be used when its in a string
    (which is two bytes long). If you're looking for a histogram of the *
    bytes *in the ciphertext, just use a map[byte]int:



    On Fri, Nov 16, 2012 at 1:17 AM, Skyler Hawthorne <
    skylerh...@gmail.com <javascript:>> wrote:
    I'm working on a cryptography assignment in which we are supposed to
    analyze ciphertexts for letter frequency. However, I noticed that if you
    iterate over a byte array and convert each byte to a string, many of the
    bytes convert to two-byte strings.

    I put together an example here: http://play.golang.org/p/xZbFTsBW00

    Why is this happening? How could a single byte be converted to a
    two-byte string?

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedNov 16, '12 at 9:22a
activeNov 18, '12 at 9:29p
posts8
users5
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase