FAQ
Hi, Hive experts,

Would you see what I am doing wrong? For a simple test of breaking a text
into words and putting these words into a table, I am doing this

CREATE EXTERNAL TABLE books1
(
words string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "\\W")
STORED AS TextFile;

LOAD DATA INPATH '/test-data/ch1/moby-dick.txt' OVERWRITE INTO TABLE
books1;

This SerDe works in Java code, but in Hive I am getting all nulls in the
books1 table.

Thank you,
Mark

Search Discussions

  • Vijay at Sep 28, 2011 at 4:19 am
    There are a couple of problems. First of all, input.regex needs to be
    "(\\w+)". Please note the case.
    The bigger problem though, is that, with this (and most) serdes, you
    can only expect one row per line of input. So multiple words within
    the text cannot generate multiple rows. The best option is to probably
    parse the text file and generate a different file with each word on a
    separate line and then load it into hive.

    Hope that helps,
    Vijay
    On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner wrote:
    Hi, Hive experts,

    Would you see what I am doing wrong? For a simple test of breaking a text
    into words and putting these words into a table, I am doing this

    CREATE EXTERNAL TABLE books1
    (
    words string
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES ("input.regex" = "\\W")
    STORED AS TextFile;

    LOAD DATA INPATH '/test-data/ch1/moby-dick.txt'  OVERWRITE INTO TABLE
    books1;

    This SerDe works in Java code, but in Hive I am getting all nulls in the
    books1 table.

    Thank you,
    Mark
  • Mark Kerzner at Sep 28, 2011 at 4:27 am
    Thank you, Vijay.

    I was beginning to understand things that way myself, and you made it
    perfectly clear.

    Sincerely,
    Mark
    On Tue, Sep 27, 2011 at 11:18 PM, Vijay wrote:

    There are a couple of problems. First of all, input.regex needs to be
    "(\\w+)". Please note the case.
    The bigger problem though, is that, with this (and most) serdes, you
    can only expect one row per line of input. So multiple words within
    the text cannot generate multiple rows. The best option is to probably
    parse the text file and generate a different file with each word on a
    separate line and then load it into hive.

    Hope that helps,
    Vijay
    On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner wrote:
    Hi, Hive experts,

    Would you see what I am doing wrong? For a simple test of breaking a text
    into words and putting these words into a table, I am doing this

    CREATE EXTERNAL TABLE books1
    (
    words string
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES ("input.regex" = "\\W")
    STORED AS TextFile;

    LOAD DATA INPATH '/test-data/ch1/moby-dick.txt' OVERWRITE INTO TABLE
    books1;

    This SerDe works in Java code, but in Hive I am getting all nulls in the
    books1 table.

    Thank you,
    Mark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 28, '11 at 1:46a
activeSep 28, '11 at 4:27a
posts3
users2
websitehive.apache.org

2 users in discussion

Mark Kerzner: 2 posts Vijay: 1 post

People

Translate

site design / logo © 2021 Grokbase