FAQ
Hi guys

I'm having a problem: I'm reading a file where fields are terminated
by space (' ', ascii 32) into a table. I'm not making these files
so I can't easily change this use of ' ' as field separator.

DROP TABLE logdata;

CREATE EXTERNAL TABLE logdata(
xxx STRING,
yyy STRING,
...
z_t)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
TABLE logdata;


This fails: All the data is read into the first field (xxx). If I
change the field separator to something else, e.g. "," things work
normally and I get to read the fields into their proper places
in the record, but then I have to edit the datafiles first and I don't
really want to do that.

Do you know how I can most easily read my logfiles?

Bjørn



--

(Rmz)

Search Discussions

  • Bjørn Remseth at Apr 4, 2011 at 9:52 am
    This is in Hive 0.7.0 I forgot to tell.

    2011/4/4 Bjørn Remseth <la3lma@gmail.com>
    Hi guys

    I'm having a problem: I'm reading a file where fields are terminated
    by space (' ', ascii 32) into a table. I'm not making these files
    so I can't easily change this use of ' ' as field separator.
    --

    (Rmz)
  • Harsh Chouraria at Apr 4, 2011 at 10:30 am
    Also, please use user@hive.apache.org for further Hive queries. Hive
    mailing lists info: http://hive.apache.org/mailing_lists.html#Users

    This ML is for the user discussion of Hadoop's common components. Thank you.

    2011/4/4 Bjørn Remseth <la3lma@gmail.com>:
    This is in Hive 0.7.0 I forgot to tell.
    --
    Harsh J
    Support Engineer, Cloudera
  • Bjørn Remseth at Apr 4, 2011 at 12:29 pm
    Ok, thanks. And your tip worked :)
    On Mon, Apr 4, 2011 at 12:29 PM, Harsh Chouraria wrote:

    Also, please use user@hive.apache.org for further Hive queries. Hive
    mailing lists info: http://hive.apache.org/mailing_lists.html#Users

    This ML is for the user discussion of Hadoop's common components. Thank
    you.

    2011/4/4 Bjørn Remseth <la3lma@gmail.com>:
    This is in Hive 0.7.0 I forgot to tell.
    --
    Harsh J
    Support Engineer, Cloudera


    --

    (Rmz)
  • Kevin Leach at Apr 4, 2011 at 1:12 pm
    Is there a better place than common-user for my question about fixed length separators?

    I am currently using a space separator for hadoop streaming using
    -Dstream.map.output.field.separator=' ' \
    -Dstream.num.map.output.key.fields=1 \

    Is there a way to fix the length at 13 bytes using the command line or do I need to write my own fixed length separator routine?

    Thanks,
    Kevin

    -----Original Message-----
    From: Harsh Chouraria
    Sent: Monday, April 04, 2011 6:30 AM
    To: common-user@hadoop.apache.org
    Subject: Re: Using space as field separator fails. How do I fix this?

    Also, please use user@hive.apache.org for further Hive queries. Hive
    mailing lists info: http://hive.apache.org/mailing_lists.html#Users

    This ML is for the user discussion of Hadoop's common components. Thank you.

    2011/4/4 Bjørn Remseth <la3lma@gmail.com>:
    This is in Hive 0.7.0 I forgot to tell.
    --
    Harsh J
    Support Engineer, Cloudera
  • Harsh Chouraria at Apr 4, 2011 at 10:22 am
    Hello Bjørn,

    2011/4/4 Bjørn Remseth <la3lma@gmail.com>:
    Hi guys

    I'm having a problem:  I'm reading a file where fields are terminated
    by space (' ', ascii 32) into a table.  I'm not making these files
    so I can't easily change this use of ' ' as field separator.
    As documented at
    http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL, you may use the
    octal number for the character as your field terminator.

    The following should work, hence:

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...
    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\040'
    STORED AS TEXTFILE;

    --
    Harsh J
    Support Engineer, Cloudera
  • Hadoopman at Apr 5, 2011 at 5:11 am
    I had a similar problem though my logs were terminated with carriage
    return. Many of the fields in my logs are deliminated with a space. We
    tried using \s but that basically removed every instance of the letter s
    (yeah I thought that was amusing too). In some cases we were able to do
    a \\t but that didn't seem to work with our logs very well. We are
    using the regex SerDe and using a regex deliminator we hand built to
    make it work. So far so good. Perhaps this is where you need to go.
    I'm still learning how that works myself. Exciting Stuff!!


    On 04/04/2011 03:50 AM, Bjørn Remseth wrote:
    Hi guys

    I'm having a problem: I'm reading a file where fields are terminated
    by space (' ', ascii 32) into a table. I'm not making these files
    so I can't easily change this use of ' ' as field separator.

    DROP TABLE logdata;

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...
    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' '
    STORED AS TEXTFILE;

    LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
    TABLE logdata;


    This fails: All the data is read into the first field (xxx). If I
    change the field separator to something else, e.g. "," things work
    normally and I get to read the fields into their proper places
    in the record, but then I have to edit the datafiles first and I don't
    really want to do that.

    Do you know how I can most easily read my logfiles?

    Bjørn


  • Alex Kozlov at Apr 5, 2011 at 5:20 am
    Try using octal, I.e. '\040'.
    On Apr 4, 2011, at 8:21 PM, hadoopman wrote:

    I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t but that didn't seem to work with our logs very well. We are using the regex SerDe and using a regex deliminator we hand built to make it work. So far so good. Perhaps this is where you need to go. I'm still learning how that works myself. Exciting Stuff!!


    On 04/04/2011 03:50 AM, Bjørn Remseth wrote:
    Hi guys

    I'm having a problem: I'm reading a file where fields are terminated
    by space (' ', ascii 32) into a table. I'm not making these files
    so I can't easily change this use of ' ' as field separator.

    DROP TABLE logdata;

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...
    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' '
    STORED AS TEXTFILE;

    LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
    TABLE logdata;


    This fails: All the data is read into the first field (xxx). If I
    change the field separator to something else, e.g. "," things work
    normally and I get to read the fields into their proper places
    in the record, but then I have to edit the datafiles first and I don't
    really want to do that.

    Do you know how I can most easily read my logfiles?

    Bjørn


  • Hadoopman at Apr 5, 2011 at 5:23 am
    Great tip. I'll give it a try.

    Thanks!

    On 04/04/2011 10:17 PM, Alex Kozlov wrote:
    Try using octal, I.e. '\040'.

    On Apr 4, 2011, at 8:21 PM, hadoopmanwrote:

    I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t but that didn't seem to work with our logs very well. We are using the regex SerDe and using a regex deliminator we hand built to make it work. So far so good. Perhaps this is where you need to go. I'm still learning how that works myself. Exciting Stuff!!


    On 04/04/2011 03:50 AM, Bjørn Remseth wrote:

    Hi guys

    I'm having a problem: I'm reading a file where fields are terminated
    by space (' ', ascii 32) into a table. I'm not making these files
    so I can't easily change this use of ' ' as field separator.

    DROP TABLE logdata;

    CREATE EXTERNAL TABLE logdata(
    xxx STRING,
    yyy STRING,
    ...
    z_t)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' '
    STORED AS TEXTFILE;

    LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
    TABLE logdata;


    This fails: All the data is read into the first field (xxx). If I
    change the field separator to something else, e.g. "," things work
    normally and I get to read the fields into their proper places
    in the record, but then I have to edit the datafiles first and I don't
    really want to do that.

    Do you know how I can most easily read my logfiles?

    Bjørn



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 4, '11 at 9:51a
activeApr 5, '11 at 5:23a
posts9
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase