Grokbase Groups Hive user May 2011
FAQ
My data format is as follows:

a&&&b
c&&&b^^xyz
c&&&d^^hdo

create table f(str1 string, str2 string) ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
With SERDEPROPERTIES (
"input.regex"="(.+)&&&(.+)(\^\^.+)?"
)

My aim is :
a b
c b
c d
However ,
a b
c b^^xyz
c d^^hdo

So how to fix the regex to get the right answer?
Thank you for help.
--
dujinhang

Search Discussions

  • 皮皮 at May 30, 2011 at 9:21 am
    maybe change the regex to
    "input.regex"="(.+)&&&(.+)\^\^.+?"












    ------------------ Original ------------------
    From: "jinhang du"<dujinhang@gmail.com>;
    Date: Mon, May 30, 2011 05:07 PM
    To: "user"<user@hive.apache.org>;

    Subject: Question about create hive tables.


    My data format is as follows:

    a&&&b
    c&&&b^^xyz
    c&&&d^^hdo


    create table f(str1 string, str2 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    With SERDEPROPERTIES (
    "input.regex"="(.+)&&&(.+)(\^\^.+)?"
    )


    My aim is :
    a b
    c b
    c d
    However ,
    a b
    c b^^xyz
    c d^^hdo


    So how to fix the regex to get the right answer?
    Thank you for help.
    --
    dujinhang
  • Jinhang du at May 30, 2011 at 10:03 am
    Because "^^ " may not appear in my data, so your regular expression doesn't
    meet my need.
    I tried and the result is not right.
    Thank you.

    在 2011年5月30日 下午5:20,皮皮 <wang.ajing@qq.com>写道:
    maybe change the regex to
    "input.regex"="(.+)&&&(.+)\^\^.+?"



    *
    *


    ------------------ Original ------------------
    *From: * "jinhang du"<dujinhang@gmail.com>;
    *Date: * Mon, May 30, 2011 05:07 PM
    *To: * "user"<user@hive.apache.org>;
    *Subject: * Question about create hive tables.

    My data format is as follows:

    a&&&b
    c&&&b^^xyz
    c&&&d^^hdo

    create table f(str1 string, str2 string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    With SERDEPROPERTIES (
    "input.regex"="(.+)&&&(.+)(\^\^.+)?"
    )

    My aim is :
    a b
    c b
    c d
    However ,
    a b
    c b^^xyz
    c d^^hdo

    So how to fix the regex to get the right answer?
    Thank you for help.
    --
    dujinhang


    --
    dujinhang
  • YUYANG LAN at May 30, 2011 at 1:46 pm
    hi, how about this ?

    (.+)&&&(.+?)(?:\^\^.*)?
    On Mon, May 30, 2011 at 6:07 PM, jinhang du wrote:
    My data format is as follows:
    a&&&b
    c&&&b^^xyz
    c&&&d^^hdo
    create table f(str1 string, str2 string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    With SERDEPROPERTIES (
    "input.regex"="(.+)&&&(.+)(\^\^.+)?"
    )

    My aim is :
    a b
    c b
    c d
    However ,
    a b
    c b^^xyz
    c d^^hdo
    So how to fix the regex to get the right answer?
    Thank you for help.
    --
    dujinhang


    --
    -------------------------------------------------------
    DAVID RAN UYOU //
  • Jinhang du at May 31, 2011 at 7:01 am
    How does the columns in the table match the "input.regex" ?

    In other words, which part of the regex matches the columns of the table?

    Will anybody offer some help?

    2011/5/30 YUYANG LAN <asyura414@gmail.com>
    hi, how about this ?

    (.+)&&&(.+?)(?:\^\^.*)?
    On Mon, May 30, 2011 at 6:07 PM, jinhang du wrote:
    My data format is as follows:
    a&&&b
    c&&&b^^xyz
    c&&&d^^hdo
    create table f(str1 string, str2 string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    With SERDEPROPERTIES (
    "input.regex"="(.+)&&&(.+)(\^\^.+)?"
    )

    My aim is :
    a b
    c b
    c d
    However ,
    a b
    c b^^xyz
    c d^^hdo
    So how to fix the regex to get the right answer?
    Thank you for help.
    --
    dujinhang


    --
    -------------------------------------------------------
    DAVID RAN UYOU //


    --
    dujinhang
  • 김영우 at May 31, 2011 at 8:35 am
    Hi dujinhang,

    See, http://wiki.apache.org/hadoop/Hive/UserGuide

    add jar ../build/contrib/hive_contrib.jar;

    CREATE TABLE apachelog (
    host STRING,
    identity STRING,
    user STRING,
    time STRING,
    request STRING,
    status STRING,
    size STRING,
    referer STRING,
    agent STRING)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^
    \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
    \"]*|\"[^\"]*\"))?",
    "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
    )
    STORED AS TEXTFILE;


    Maybe you need to set the *'output.format.string*' property.

    HTH,

    - Youngwoo

    2011/5/31 jinhang du <dujinhang@gmail.com>
    How does the columns in the table match the "input.regex" ?

    In other words, which part of the regex matches the columns of the table?

    Will anybody offer some help?

    2011/5/30 YUYANG LAN <asyura414@gmail.com>
    hi, how about this ?

    (.+)&&&(.+?)(?:\^\^.*)?
    On Mon, May 30, 2011 at 6:07 PM, jinhang du wrote:
    My data format is as follows:
    a&&&b
    c&&&b^^xyz
    c&&&d^^hdo
    create table f(str1 string, str2 string) ROW FORMAT SERDE
    'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    With SERDEPROPERTIES (
    "input.regex"="(.+)&&&(.+)(\^\^.+)?"
    )

    My aim is :
    a b
    c b
    c d
    However ,
    a b
    c b^^xyz
    c d^^hdo
    So how to fix the regex to get the right answer?
    Thank you for help.
    --
    dujinhang


    --
    -------------------------------------------------------
    DAVID RAN UYOU //


    --
    dujinhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMay 30, '11 at 9:07a
activeMay 31, '11 at 8:35a
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase