FAQ
I have a table in my default database with 5 columns. When I look at the
sample in the Hue Metastore Manager it looks fine. However, when I do a
"select * from table" in the Impala query editor, the result has all of the
data squished into the first column and nothing in the remaining 4. Can
anyone tell me how to fix this?

Thanks,

Alex

Search Discussions

  • Udai Kiran Potluri at Jun 3, 2013 at 6:17 pm
    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at the
    sample in the Hue Metastore Manager it looks fine. However, when I do a
    "select * from table" in the Impala query editor, the result has all of the
    data squished into the first column and nothing in the remaining 4. Can
    anyone tell me how to fix this?

    Thanks,

    Alex
  • Alex Minnaar at Jun 3, 2013 at 6:28 pm
    I've just been using the web UI. I'm not sure how to use the impala shell.
      If you would like tell me how that would be great ;)
    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    I have a table in my default database with 5 columns. When I look at the
    sample in the Hue Metastore Manager it looks fine. However, when I do a
    "select * from table" in the Impala query editor, the result has all of the
    data squished into the first column and nothing in the remaining 4. Can
    anyone tell me how to fix this?

    Thanks,

    Alex
  • Ricky Saltzer at Jun 3, 2013 at 6:39 pm
    Hey Alex -

    If you log into one of the Impala nodes via SSH you should be able access
    the shell by running the following...

    $ impala-shell

    Hope this helps
    Ricky

    On Mon, Jun 3, 2013 at 2:28 PM, Alex Minnaar wrote:

    I've just been using the web UI. I'm not sure how to use the impala
    shell. If you would like tell me how that would be great ;)

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at
    the sample in the Hue Metastore Manager it looks fine. However, when I do
    a "select * from table" in the Impala query editor, the result has all of
    the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alex Minnaar at Jun 3, 2013 at 6:40 pm
    Ok disregard my last response. I was able to run it in the impala shell
    and got the same result. All data was in the first column, the rest were
    empty.
    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    I have a table in my default database with 5 columns. When I look at the
    sample in the Hue Metastore Manager it looks fine. However, when I do a
    "select * from table" in the Impala query editor, the result has all of the
    data squished into the first column and nothing in the remaining 4. Can
    anyone tell me how to fix this?

    Thanks,

    Alex
  • Ricky Saltzer at Jun 3, 2013 at 6:42 pm
    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended describe on
    the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala shell
    and got the same result. All data was in the first column, the rest were
    empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at
    the sample in the Hue Metastore Manager it looks fine. However, when I do
    a "select * from table" in the Impala query editor, the result has all of
    the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Ricky Saltzer at Jun 3, 2013 at 6:57 pm
    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location:
    hdfs://ip-10-245-112-238.us-west-2.compute.internal:8020/user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:
    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended describe
    on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala shell
    and got the same result. All data was in the first column, the rest were
    empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at
    the sample in the Hue Metastore Manager it looks fine. However, when I do
    a "select * from table" in the Impala query editor, the result has all of
    the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alex Minnaar at Jun 3, 2013 at 7:03 pm
    I am actually using Impala because my Hive queries are taking a very long
    time, Impala seems to be faster. So I am running the query you suggested
    but it might take some time ;(
    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?


    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location:
    hdfs://ip-10-245-112-238.us-west-2.compute.internal:8020/user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:
    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at
    the sample in the Hue Metastore Manager it looks fine. However, when I do
    a "select * from table" in the Impala query editor, the result has all of
    the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Alex Minnaar at Jun 3, 2013 at 7:16 pm
    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?
    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?


    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location:
    hdfs://ip-10-245-112-238.us-west-2.compute.internal:8020/user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.TextInputFormat
    OutputFormat:
    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look at
    the sample in the Hue Metastore Manager it looks fine. However, when I do
    a "select * from table" in the Impala query editor, the result has all of
    the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Ricky Saltzer at Jun 3, 2013 at 7:25 pm
    Could you provide a little more information on this table? For example, how
    was this table created and populated? Usually when you see all your data in
    one column, it's because the field delimiter is incorrectly set and so
    Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**
    west-2.compute.internal:8020/**user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**
    serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**TextInputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**
    HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look
    at the sample in the Hue Metastore Manager it looks fine. However, when I
    do a "select * from table" in the Impala query editor, the result has all
    of the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alan Jackoway at Jun 3, 2013 at 7:27 pm
    Hello,

    When you tried to run a hive query, did you see status. If hive kicks off a
    mapreduce job, it should give you the completion percentage of those jobs.

    How much data is in your table? Can you create another table with a subset
    of the data so that queries will finish faster?

    Alan

    On Mon, Jun 3, 2013 at 3:25 PM, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**
    west-2.compute.internal:8020/**user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**
    serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**TextInputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**
    HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look
    at the sample in the Hue Metastore Manager it looks fine. However, when I
    do a "select * from table" in the Impala query editor, the result has all
    of the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Alex Minnaar at Jun 3, 2013 at 7:36 pm
    I was able to run 'select * from table limit 10' on the hive web UI and it
    produced a somewhat correct result which I included as a screenshot. The
    correct part is that the rows are separated correctly between the five
    columns, however, four blank columns look to be appended to the end which
    is weird. See the attached screenshot.
    On Monday, June 3, 2013 3:27:50 PM UTC-4, Alan Jackoway wrote:

    Hello,

    When you tried to run a hive query, did you see status. If hive kicks off
    a mapreduce job, it should give you the completion percentage of those jobs.

    How much data is in your table? Can you create another table with a subset
    of the data so that queries will finish faster?

    Alan


    On Mon, Jun 3, 2013 at 3:25 PM, Ricky Saltzer <ri...@cloudera.com<javascript:>
    wrote:
    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.




    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**
    west-2.compute.internal:8020/**user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**
    serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**TextInputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**
    HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look
    at the sample in the Hue Metastore Manager it looks fine. However, when I
    do a "select * from table" in the Impala query editor, the result has all
    of the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Alex Minnaar at Jun 3, 2013 at 7:40 pm
    The original text file was uploaded to hdfs via the Hue file browser. Then
    I used the 'create a new table from a file' link in the Metastore Manager
    and followed the appropriate steps. When I look at the sample for the
    table, it looks correct (all rows are separated correctly). I don't know
    how I would upload the original text file via command line because I am
    running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.
    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.




    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**
    west-2.compute.internal:8020/**user/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**
    serde2.lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**TextInputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**
    HiveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look
    at the sample in the Hue Metastore Manager it looks fine. However, when I
    do a "select * from table" in the Impala query editor, the result has all
    of the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Ricky Saltzer at Jun 3, 2013 at 7:44 pm
    Alex -

    The original text file you uploading via Hue, what was the format of it?
    Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any other
    suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we**
    st-2.compute.internal:8020/**use**r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2**
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text**InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H**
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I look
    at the sample in the Hue Metastore Manager it looks fine. However, when I
    do a "select * from table" in the Impala query editor, the result has all
    of the data squished into the first column and nothing in the remaining 4.
    Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alex Minnaar at Jun 3, 2013 at 7:48 pm
    It was tab delimited
    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of it?
    Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky


    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we**
    st-2.compute.internal:8020/**use**r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2**
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text**InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H**
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai

    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar wrote:

    I have a table in my default database with 5 columns. When I
    look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Ricky Saltzer at Jun 3, 2013 at 7:56 pm
    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?

    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar wrote:

    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of it?
    Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For example,
    how was this table created and populated? Usually when you see all your
    data in one column, it's because the field delimiter is incorrectly set and
    so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in Hive
    okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we****
    st-2.compute.internal:8020/**use****r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2****
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text****
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H****
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the impala
    shell and got the same result. All data was in the first column, the rest
    were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <minna...@gmail.com
    wrote:
    I have a table in my default database with 5 columns. When I
    look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alex Minnaar at Jun 3, 2013 at 8:01 pm
    It is the latest version. I resaved the initial file to make sure that it
    is delimited this way. I'll try the process again.
    On Monday, June 3, 2013 3:56:43 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?


    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of it?
    Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For
    example, how was this table created and populated? Usually when you see all
    your data in one column, it's because the field delimiter is incorrectly
    set and so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in
    Hive okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we****
    st-2.compute.internal:8020/**use****r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2****
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text****
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H****
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky

    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar wrote:

    Ok disregard my last response. I was able to run it in the
    impala shell and got the same result. All data was in the first column,
    the rest were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <
    minna...@gmail.com> wrote:
    I have a table in my default database with 5 columns. When I
    look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Alex Minnaar at Jun 3, 2013 at 8:19 pm
    my Hive commands run unbelievably slow so I'm just getting a blinking
    cursor right now. I'm not sure how long it will take.
    On Monday, June 3, 2013 4:09:13 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I actually think this may be a bug in the new version of Hue, but I think
    I have a workaround for you! Log into one of your nodes via SSH, and open
    up the hive shell (*$ hive)*
    *
    *
    Now, run the following:

    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('field.delim' = '\t');
    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('serialization.format'
    = '\t');

    Then, open up the impala shell *($ impala-shell)*
    *
    *
    impala-host:21000> refresh;
    impala-host:21000> SELECT * FROM <table_name> LIMIT 10;

    Let me know if this works...

    Ricky



    On Mon, Jun 3, 2013 at 4:01 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    It is the latest version. I resaved the initial file to make sure that
    it is delimited this way. I'll try the process again.

    On Monday, June 3, 2013 3:56:43 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?

    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar wrote:

    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of
    it? Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For
    example, how was this table created and populated? Usually when you see all
    your data in one column, it's because the field delimiter is incorrectly
    set and so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in
    Hive okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we******
    st-2.compute.internal:8020/**use******r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2******
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text******
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H******
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky


    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar <minna...@gmail.com
    wrote:
    Ok disregard my last response. I was able to run it in the
    impala shell and got the same result. All data was in the first column,
    the rest were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <
    minna...@gmail.com> wrote:
    I have a table in my default database with 5 columns. When I
    look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Alex Minnaar at Jun 3, 2013 at 8:29 pm
    Ok I ran the Hive bit in the web UI because my command line seems to take
    forever, and it looks like it has worked. Your help is greatly
    appreciated, thank you
    On Monday, June 3, 2013 4:09:13 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I actually think this may be a bug in the new version of Hue, but I think
    I have a workaround for you! Log into one of your nodes via SSH, and open
    up the hive shell (*$ hive)*
    *
    *
    Now, run the following:

    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('field.delim' = '\t');
    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('serialization.format'
    = '\t');

    Then, open up the impala shell *($ impala-shell)*
    *
    *
    impala-host:21000> refresh;
    impala-host:21000> SELECT * FROM <table_name> LIMIT 10;

    Let me know if this works...

    Ricky



    On Mon, Jun 3, 2013 at 4:01 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    It is the latest version. I resaved the initial file to make sure that
    it is delimited this way. I'll try the process again.

    On Monday, June 3, 2013 3:56:43 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?

    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar wrote:

    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of
    it? Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file browser.
    Then I used the 'create a new table from a file' link in the Metastore
    Manager and followed the appropriate steps. When I look at the sample for
    the table, it looks correct (all rows are separated correctly). I don't
    know how I would upload the original text file via command line because I
    am running Hadoop ontop of Amazon ec2 so I would first have to get the file
    into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For
    example, how was this table created and populated? Usually when you see all
    your data in one column, it's because the field delimiter is incorrectly
    set and so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in
    Hive okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we******
    st-2.compute.internal:8020/**use******r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2******
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text******
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H******
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an extended
    describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky


    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar <minna...@gmail.com
    wrote:
    Ok disregard my last response. I was able to run it in the
    impala shell and got the same result. All data was in the first column,
    the rest were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <
    minna...@gmail.com> wrote:
    I have a table in my default database with 5 columns. When I
    look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Ricky Saltzer at Jun 3, 2013 at 8:31 pm
    Awesome, so the Impala query works fine now?

    On Mon, Jun 3, 2013 at 4:29 PM, Alex Minnaar wrote:

    Ok I ran the Hive bit in the web UI because my command line seems to take
    forever, and it looks like it has worked. Your help is greatly
    appreciated, thank you

    On Monday, June 3, 2013 4:09:13 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I actually think this may be a bug in the new version of Hue, but I think
    I have a workaround for you! Log into one of your nodes via SSH, and open
    up the hive shell (*$ hive)*
    *
    *
    Now, run the following:

    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('field.delim' = '\t');
    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES
    ('serialization.format' = '\t');

    Then, open up the impala shell *($ impala-shell)*
    *
    *
    impala-host:21000> refresh;
    impala-host:21000> SELECT * FROM <table_name> LIMIT 10;

    Let me know if this works...

    Ricky


    On Mon, Jun 3, 2013 at 4:01 PM, Alex Minnaar wrote:

    It is the latest version. I resaved the initial file to make sure that
    it is delimited this way. I'll try the process again.

    On Monday, June 3, 2013 3:56:43 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?

    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar wrote:

    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of
    it? Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file
    browser. Then I used the 'create a new table from a file' link in the
    Metastore Manager and followed the appropriate steps. When I look at the
    sample for the table, it looks correct (all rows are separated correctly).
    I don't know how I would upload the original text file via command line
    because I am running Hadoop ontop of Amazon ec2 so I would first have to
    get the file into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For
    example, how was this table created and populated? Usually when you see all
    your data in one column, it's because the field delimiter is incorrectly
    set and so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query. Any
    other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in
    Hive okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?

    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar wrote:

    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we********
    st-2.compute.internal:8020/**use********
    r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2********
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text********
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H********
    iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an
    extended describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky


    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar <
    minna...@gmail.com> wrote:
    Ok disregard my last response. I was able to run it in the
    impala shell and got the same result. All data was in the first column,
    the rest were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <
    minna...@gmail.com> wrote:
    I have a table in my default database with 5 columns. When
    I look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com
  • Alex Minnaar at Jun 3, 2013 at 8:34 pm
    Yeah the data in each row is in the column that is supposed to be in as
    opposed to all squished in the first column which was the problem before,
    thanks again. I guess I'm gonna have to do this for each table in the
    future
    On Monday, June 3, 2013 4:31:14 PM UTC-4, Ricky Saltzer wrote:

    Awesome, so the Impala query works fine now?


    On Mon, Jun 3, 2013 at 4:29 PM, Alex Minnaar <minna...@gmail.com<javascript:>
    wrote:
    Ok I ran the Hive bit in the web UI because my command line seems to take
    forever, and it looks like it has worked. Your help is greatly
    appreciated, thank you

    On Monday, June 3, 2013 4:09:13 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I actually think this may be a bug in the new version of Hue, but I
    think I have a workaround for you! Log into one of your nodes via SSH, and
    open up the hive shell (*$ hive)*
    *
    *
    Now, run the following:

    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES ('field.delim' =
    '\t');
    hive> ALTER TABLE <table_name> SET SERDEPROPERTIES
    ('serialization.format' = '\t');

    Then, open up the impala shell *($ impala-shell)*
    *
    *
    impala-host:21000> refresh;
    impala-host:21000> SELECT * FROM <table_name> LIMIT 10;

    Let me know if this works...

    Ricky


    On Mon, Jun 3, 2013 at 4:01 PM, Alex Minnaar wrote:

    It is the latest version. I resaved the initial file to make sure that
    it is delimited this way. I'll try the process again.

    On Monday, June 3, 2013 3:56:43 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I'm trying to reproduce on my side, which version of CDH is this?

    On Mon, Jun 3, 2013 at 3:48 PM, Alex Minnaar wrote:

    It was tab delimited

    On Monday, June 3, 2013 3:44:40 PM UTC-4, Ricky Saltzer wrote:

    Alex -

    The original text file you uploading via Hue, what was the format of
    it? Was it a comma separated, tab separated, or something else?

    Thanks,
    Ricky

    On Mon, Jun 3, 2013 at 3:40 PM, Alex Minnaar wrote:

    The original text file was uploaded to hdfs via the Hue file
    browser. Then I used the 'create a new table from a file' link in the
    Metastore Manager and followed the appropriate steps. When I look at the
    sample for the table, it looks correct (all rows are separated correctly).
    I don't know how I would upload the original text file via command line
    because I am running Hadoop ontop of Amazon ec2 so I would first have to
    get the file into ec2 somehow.

    On Monday, June 3, 2013 3:25:52 PM UTC-4, Ricky Saltzer wrote:

    Could you provide a little more information on this table? For
    example, how was this table created and populated? Usually when you see all
    your data in one column, it's because the field delimiter is incorrectly
    set and so Impala (or Hive) is not able to correctly parse the columns.



    On Mon, Jun 3, 2013 at 3:16 PM, Alex Minnaar wrote:

    Yeah, its taking forever, I don't think I can run this query.
    Any other suggestions?

    On Monday, June 3, 2013 2:57:31 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    I should have also probably asked if you can query this table in
    Hive okay...

    Try:
    $ hive
    $ select * from <table_name> LIMIT 10;

    How does the output look?


    On Mon, Jun 3, 2013 at 2:55 PM, Alex Minnaar <minna...@gmail.com
    wrote:
    The result of that is

    # col_name data_type comment

    one string None
    two string None
    three string None
    four string None
    five string None

    # Detailed Table Information
    Database: default
    Owner: hdfs
    CreateTime: Mon Jun 03 15:38:28 UTC 2013
    LastAccessTime: UNKNOWN
    Protect Mode: None
    Retention: 0
    Location: hdfs://ip-10-245-112-238.us-**we*******
    *st-2.compute.internal:8020/**use********
    r/hive/warehouse/text_try
    Table Type: MANAGED_TABLE
    Table Parameters:
    numFiles 1
    numPartitions 0
    numRows 0
    rawDataSize 0
    totalSize 48047371
    transient_lastDdlTime 1370273911

    # Storage Information
    SerDe Library: org.apache.hadoop.hive.**serde2********
    .lazy.LazySimpleSerDe
    InputFormat: org.apache.hadoop.mapred.**Text********
    InputFormat
    OutputFormat: org.apache.hadoop.hive.ql.io.**H*******
    *iveIgnoreKeyTextOutputFormat
    Compressed: No
    Num Buckets: -1
    Bucket Columns: []
    Sort Columns: []
    Storage Desc Params:
    field.delim \u0001
    serialization.format \u0001
    Time taken: 0.409 seconds


    On Monday, June 3, 2013 2:42:02 PM UTC-4, Ricky Saltzer wrote:

    Hey Alex -

    Can you open up a Hive shell (*$ hive) *and perform an
    extended describe on the table?

    $ hive
    $ describe formatted <table_name>
    or
    $ describe extended <table_name>

    The formatted table description is nicer to read...

    Thanks!
    Ricky


    On Mon, Jun 3, 2013 at 2:40 PM, Alex Minnaar <
    minna...@gmail.com> wrote:
    Ok disregard my last response. I was able to run it in the
    impala shell and got the same result. All data was in the first column,
    the rest were empty.

    On Monday, June 3, 2013 2:16:57 PM UTC-4, Udai wrote:

    Hi Alex,

    What do you see when you run the query on Impala shell?

    Thanks,
    Udai


    On Mon, Jun 3, 2013 at 11:01 AM, Alex Minnaar <
    minna...@gmail.com> wrote:
    I have a table in my default database with 5 columns. When
    I look at the sample in the Hue Metastore Manager it looks fine. However,
    when I do a "select * from table" in the Impala query editor, the result
    has all of the data squished into the first column and nothing in the
    remaining 4. Can anyone tell me how to fix this?

    Thanks,

    Alex

    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJun 3, '13 at 6:01p
activeJun 3, '13 at 8:34p
posts21
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase